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THE LIMITS OF A DISTRIBUTION FUNCTION IF TWO 
EXPECTED VALUES ARE GIVEN 


By R. v. MisEs 


In a very interesting paper’ A. Wald dealt with the following generalization 
of a problem started by Markoff and Tchebycheff: Denote by X a random 
variable, by P(t) the probability of | X | < ¢ and by M, the absolute moment 
of order r or the expected value of | X |’; what is the sharp lower limit (limes 
inferior) of P(t) for any point t, if Ma, Ms, M,,--- aregiven? Wald outlines 
an ingenious method for the general case of m given moments and adds the 
complete solution for the case » = 2. I wish to show in the following lines 
that the results for n = 2 can be deduced both in a more general and less com- 
plicated manner. Instead of two different powers of | X | , I shall admit largely 
arbitrary functions of X and I shall get the solution by a more intuitive way. 
Moreover the upper limit of P(¢) will be found too. It seems to me that my 
method will be applicable also to certain cases with n > 2. 


1. The Problem. Without loss of generality we can restrict ourselves to a 
non-negative random variable X. Let 2(X) and y(X) be two increasing func- 


tions of X with z(0) = y(0) = 0. We suppose that the curve defined in a 
Cartesian co-ordinate system by 


(1) «= x(t), = y(t) 


is one which is convex downwards, i.e. the slope of its chords is increasing if the 
co-ordinates of one or both extreme points of the chord increase (see Fig. 1). 
This condition is fulfilled, for instance, if x = ¢’, y = t’ ands > r > 0 where the 
indexes 7, s are not necessarily integers. Another example is x = t, y = (?/1 + #; 
here, however, the ratio y/z is restricted to values between 0 and 1. Ina third 
class of examples as « = t/1 + t, y = t°/(1 + #)* the curve corresponding to (1) 
‘ ends at a finite point. 

The probability of the inequality X < ¢ will be designated by P(#), the proba- 
bility of X > t by P(t). The sum of P(é) and P(Z) is equal to 1 excepting the 
points ¢ associated with a finite probability. But in any case the upper limit 
of P(t) and the lower limit of P(t) give the sum 1. 

The expected values of x(X) and y(X) can be defined by means of P(#) or P(#) 


a= [ , a(t) dP(t) = — [ 7 a(t) dP(t); 


o= | ” y(t) dP) = — [ ” y(t) dP(t). 


1Ann. Math. Statist. Vol. 9 (1938) pp. 244-255. 
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We suppose that the values of a and b are given in a suitable manner and we 
ask for the lower limit of P(t) and P(t) at any point ¢. In other words, we try 
to find two functions [(¢) and I(¢) so that for all distributions associated with the 
given values of a and b we have 


(3) Pt) 21%), P(t) 2 1d, 


but that these inequalities are not valid, if l(t) and l(t) are replaced by higher 
values. In Fig. 1 K is the curve defined by (1) and C the point with co-ordi- 
nates a, b. 

We can give a more intuitive interpretation to our problem by imagining a 
mass distribution instead of a probability or frequency distribution. In fact, 
if the mass of magnitude 1 is spread along the curve K in such a way that P(t) 
designates the sum (or integral) of masses lying to the left of the point x(¢), y(é), 
then the point C will be the centre of gravity (centre of mass) of the whole mass 
system. By the way, it follows that C must be situated on the inner side of the 
convex curve K. Our question can now be stated as follows: 

A mass of size 1 is distributed along a given convex curve and has its centre 
of gravity in a given point C. What is the least possible value of mass lying 
to the left or to the right of any point x(t), y(t) of the curve? 


2. Restriction of Distributions to be Considered. The essential difficulty 
of our problem lies in the fact that in order to find the limits I(t) and l(¢) all 
conceivable forms of distribution functions P(t) and P(t) must be taken into 
account. Let us now see how the field of distributions can be restricted in a 
decisive manner. 

Two mass systems with the same total mass and the same centre of gravity 
will be called ‘‘equivalent systems’. Then the following corollary can be 
stated: 

If a mass system with the distribution functions P, P and a point M with co- 
ordinates x(t), y(t) are given, we can always find an equivalent system consisting 
of three particles or masspoints: a mass m, at M, to the left of M, a mass m at M 
itself and a mass m2 at M; to the right of M, so that 


(4) m = P(t), mz S P(t). 


This proposition enables us, in asking for the lower limit values [(¢), l(t) at M 
to confine ourselves to the consideration of a special class of three-point systems 
and to disregard all other kinds of distributions. 

In order to prove the corollary we make use of the well known laws of ele- 
mentary statics. According to these laws all masses lying to the left of M (in the 
given system) can be replaced by a single mass of same size fixed in their centre 
of gravity C. (Fig. 1). This centre is situated in the domain between the 
curve K and the chordOM. The straight line MC; has one and only one second 
point of intersection M, with K. Any mass at C; can be decomposed into two 
masses, one of them of magnitude m, lying at M,, the other of size m’ at. M. 
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In an analogous manner starting with the masses lying to the right of M in the 
given system, mz, m’’ and Mz can be found. It is evident that m can not 
exceed the sum of masses which were attached to points to left of M in the 
original mass system. It is the same with m: and the masses to the right of M. 
If in the original system a finite mass m had been attached to the point M, the 
value of m in the new mass system will be defined as m = m’ + m” + m. 


3. The Extreme Distributions. Now, in order to find the limits l(t) and L(t) 
for a certain point M, we are concerned exclusively with a two-parameter family 
of mass systems, each of them consisting of three masses m , m, m2 at three 
points M,, M, Mz. We choose as parameters the magnitude m of the mass 
attached to the point M and the slope of the chord joining M, and M.. If m 
remains constant, the chord M,M, (Fig. 2) passes through a fixed point Co on 


Fig. 1 Fic. 2 


the prolongation of MC where CC, = MC.m/1 — m. The masses m and m2 
vary with the direction of M,M; and are determined by 


(5) n= ti ~ op Se, me = (1 — 


1 2 


We are only interested in the least possible values of m and m:.. But a 
convex curve for which the angle formed by its extreme tangents is not greater 
than 90°, has the characteristic property that the ratio of chord segments 


M,C, :C)M;, for an inner point Co, is permanently increasing or decreasing 
when the chord turns about C, ; there is no analytical maximum or minimum. 
It follows that the lowest values of m, and m2 can only be found in an extreme 
position of the chord, i.e. when M, coincides with O, or M2 with the other (even- 
tually infinite) end Q of K, or finally when one of the points M,; , M2 coincides 
with M. The latter cases must be mentioned since it was one of the conditions 
for our three-point systems that M lies between M, and M;. The result we 
have obtained until now is, that the lowest values of masses lying on one or the 
other side of M are to be sought in a distribution of one of the following classes: 
(1) The three-mass systems with one mass at M and one mass at O; (2) The three- 
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mass systems with one mass at M and one mass at the end Q of K; (3) The two- 
mass system with one mass at M. 

Now we must distinguish three sorts of points M or three sections of the 
curve K. If we trace the chord (see Fig. 3) beginning at O and passing through 
C we obtain the point of intersection O’ and by means of the chord QC we arrive 
at the point Q’. The three sections of K we have to deal with are OQ’, Q’O’ 
and O’Q. 

If M is a point of OQ’ there exists a chord MM’ passing through C and there- 
fore a two-mass system with masses m, m’ at M and M’. In this system the 
mass to the left of M is zero, thus we have l(t) = 0 for all these points. If we 
consider a three-mass system with one mass at M, one mass at O and one mass 
at any point Mz , the value of mz is equal to the ratio CC;/M2C; , where C; is the 
intersection of M.C with OM. The least value of this ratio will be reached 
when C, coincides with M. Therefore I(t) is equal to the ratio CM/M’M or 
equal to the mass m’ of the two-mass system mentioned before. 








Fic. 3 


Now let M be a point of the arc Q’O’. For such a point a two-mass system 
does not exist, since the straight line M C does not meet the curve a second time. 
In a three-mass system O M M; the value of m is equal to CC:/M.C, as before, 
and the least value of this ratio is attained, if M2 coincides with Q. It follows 
that l(t) is equal to the ratio CCo/QCe where Co denotes the point of inter- 
section of QC with OM. In the same way we find I(t) equal to CCo/OCo , the 
point of intersection of OC with MQ being designated by Co . 

For a point M of the are 0’Q the circumstances are the same as for the points 
of OQ’. 

In other words the extreme distributions which furnish immediately the 
values of I(t) and I(¢) are 1) the two-mass systems MM’ for all points of the 
arcs OQ’ and O’Q and 2) the three-mass system OMQ for a point of the middle 
section Q’O’. The corresponding values of | and 1 are to be found by the ele- 
mentary laws of statics in the simplest way. 


4. Results. The definite results can now be stated as follows. Our data 
are the functions x(¢), y(¢) and the expected values a, b. 
First we compute the co-ordinates p, q of the endpoint Q, ie. p = 2x(~), 














LIMITS OF A DISTRIBUTION FUNCTION 103 


q = y(~). If q or p and q are infinite, we only need the limit value of y/z. 
Then the two values é and ¢ corresponding to the points O’ and Q’ have to be 
found. They are determined by the equations 

y(to) _ bd, yt) -—b_b-q 


” xe) a x(@) a ap" 
If t belongs to one of the intervals ¢ < ¢ ort = t, there exists one and only 
one value of ¢’ different from ¢ and satisfying the equation 
a — x(t) b — y(t) 


(7) x(t’) — a(t) yt’) — yt) 
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The point M’ with co-ordinates x’ = x(t’), y’ = y(t’) is the second endpoint 
of the chord passing through M and C. Now we have, according to the pre- 
ceding considerations: 


Fort St’: l(t) = 0, l(t) = a : 
a’ — 2x 





8)“ 2 <t<th:l® = GC —PMO-—M-@-DO-D wiv 
PY — I py — qx 


o-*, W=0 
t—T 


, 


= 2 = to: L(t) 
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The formulae are considerably simplified, if p, q are infinite. In the case 
of two moments given, x = t', y = #,s > r > 0, we have p = ~©,qg = ~, 
lim y/t = ©. The second equation (6) gives x(t°) = a and (8) becomes: 


Fort < U(t) = 0, «5 
rzT—wTZ 


“fests me ; l(t) = 0 


“ ¢2&, W(t) = ~—— l(t) = 0 


The values of I(t) given here are in full accordance with the results published 
by Wald in his paper quoted above. 

A great part of the numerical investigation is independent from the relation 
which joins x (or y) to ¢ and is determined only by the values a, b, the curve K, 
i.e. the relation between x and y, and its endpoint Q. In the following example 
we have assumed y = x and as endpoint p = q = 1. Fig. 4 shows fora = 0, 6, 
b = 0, 4 the three sections of the lines 1 and 1 — 1 according to the equations (8), 
but with the abscissae x. The graph of any distribution function in the interval 
0 < x S 1 with given first moment 0, 6 and third moment 0, 4 keeps within 
the space between the lines] and 1 — 1. If we now assume, e.g. 2 =¢/1+? 
the abscissae x are to be transformed according to this equation and the graphs 
of definitive 1(¢) and 1 — I(¢) functions are those given in fig. 5. Any distribution 
functions P(t) with the expected values 


2 t? 3 
/ iz? dP(t) = 0,6 / (53) dP(t) = 0,4 


must keep between the two limits indicated in Fig. 5. If such a function 
touches the upper limit in any point, it will also attain the lower limit in another 
point and will correspond to a two- or three-mass system. 


UNIVERSITY OF ISTANBUL, 
ISTANBUL, TURKEY. 





CONFIDENCE LIMITS FOR CONTINUOUS DISTRIBUTION FUNCTIONS’ 
By A. Waup’ aNp J. WoLFowITz 


1. Introduction. The theory of confidence limits for unknown parameters 
of distribution functions has been considerably developed in recent years. This 
theory assumes that there is given a family F of systems of n stochastic variables 
Xi(1, --- , O%), --- , Xn(01, --- , 0.) depending upon k parameters 6,, --- , % 
and such that the distribution function of every element of F is known. 

For the case k = 1, for example, this theory proceeds as follows: 

Denote by E an n-tuple 2, ---,2%, of observed values of the stochastic 
variables X,(6), --- , X,(0) of which we know only that they constitute a system 
which is an element of F. E can be represented as the point 7, --- , 2, in an 
n-dimensional Euclidean space. Let there be given a positive number a, 0 < 
a <1. Then toeach pair E, a there is constructed a 6-interval, [@(E, «), 6(E, «)] 
with the following property: If we were to draw a sample from the system 
X,(0), --- , Xn(0), the probability is exactly a that we shall get a system of 
observations E = 2, ---,2, such that the interval corresponding to EZ, a 
will include 6 (i.e., that 0(E, a) < 6 < 6(E, a)). 

In this paper we do not limit ourselves to a family of systems of n stochastic 
variables depending upon a finite number of parameters, but consider the family 
G of all systems of n stochastic variables X,, --- , X, subject only to the condi- 
tion that X1, --- , X, are independently distributed with the same continuous 
distribution function. 

Let E be the point in an n-dimensional Euclidean space which corresponds to 
the observed values 2, --- , 2, of the n stochastic variables X,, --- , X, of 
which we know only that they constitute an element of the family G, i.e., that 
they are independently distributed with the same continuous distribution func- 
tion. Let us denote their distribution function by f(x); the probability that 
X; < risf(x),t = 1,---,n. Let a bea number such that0 <a <1. To 
each pair E, a we shall construct two functions, lz,.(z) and lz,.(z), with the 
following property: The probability is a that, if we were to draw a sample 
from the system X,,--- ,X,, we would get a system of observations E = 
%1,--+,2%n such that f(x) lies entirely between lz,.(x) and lz,2(x) (i.e., that 
le,a(t) < f(x) < lz,.(x) for all x). We shall call lz,.(x) and lz,2(x) the upper 
and lower confidence limits, respectively, corresponding to the confidence 
coefficient a. 


1 Presented to the American Mathematical Society at New York, February 25, 1939. 
* Research under a grant-in-aid from the Carnegie Corporation of New York. 
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All the stochastic variables considered hereafter in this paper are to have 
continuous distribution functions. 


2. A theorem on continuous distribution functions. Let f(x) be the con- 
tinuous distribution function of a stochastic variable X whose range is from 
—o to +o. Let 6,(7) and 6.(x) be two functions defined for 0 < zx < 1 
and satisfying the following requirements: 

(a) 6:(x) and 6(2) are non-negative and continuous for 0 < z < 1. 

(b) L(x) and I.(2) are monotonically non-decreasing for all x, where 


h(x) = f(z) + &(f(a)) 

l(x) = f(x) — 62(f()). 
(c) There exists a number A, such that f(h) < 1 and d(h) = 1. 
(d) There exists a number h’, such that f(h’) > 0 and 1,(h’) = 0. 


(e) L(x) < 1 forall z 
L(x) > 0 for all x 


(f) d:(x) + 62(x) > . for all x, where n is the number of random, independent 


observations of the stochastic variable X. 
Let ¢(x) be the distribution function of such a system of observations, i.e., 
the ratio, to n, of the number of observations <z is ¢(x). ¢(zx) is, of course, 


a multiple of * for all x. 


We shall consider the following problem: 
What is the probability P that 


(1) I(x) < o(x) < L(z) 
for all x? 

The reasons for restrictions (b), (c), (d), (e), and (f) on 6,(x) and 6.(x7) are 
now apparent. If there exist two numbers q < qe, such that, for q <x < q@, 
L(x) > (qe) and i(q:) = h(qe), then, if we change l,(x) so that 1,(r7) = 1,(q) 
for gq. < x < q, P will remain unchanged. An analogous process leads to a 
similar conclusion for l2(7). Hence 1,(z) and 1.(x) are to be monotonically 
non-decreasing. If there did not exist a number h or h’, P would be 0. Hence 
requirements (c) and (d). Since 0 < g(x) < 1, there is no point to considering 


functions which do not satisfy (e). g(x) is a step-function whose saltuses are 


>1. if for all z. 
nN 


then P = 0. If there is an interval [8, y] within which 6,(z) + &(z) < =, 


then all samples in which one of the observed values lies in this interval are 
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such that (1) does not hold for all x. For the sake of simplicity and because 
the situation described in (f) is the one of importance, we make the latter 
requirement. 

It would appear that P depends upon f(z), 6:(x), 62(x), and n. 

THEOREM: P is independent of f(x) and depends only upon 6;(x), 52(x), and n. 

Proor: Let Y = f(X). Then Y is a stochastic variable distributed in the 
range 0 to 1 with a distribution function =z. By this transformation 1,(z) 
and 1,.(x) become respectively 


(xz) = 2+ | 


” lo(x) = 2 — (zx) 


<a i. 


Then P is the probability that the distribution function g(x) of a random 
sample of n of the stochastic variable Y shall be such that (x) < ¢(x) < (2) 
and is therefore independent of f(z). 


3. Computation of P. From the previous section it follows that, in com- 
puting P, we may confine ourselves to a stochastic variable X whose range is 
from 0 to 1 and whose distribution function =z. Let 1,(x) and 1.(x) be the 
upper and lower limits, respectively, which are set for g(x). U(x) and I(x) 
are defined in (2), if the accents are omitted. 

Consider the equations: 


(3) L(x) = @G@=1,2,---,n;0 <2 <1). 


4 
n 


If, for a certain 7, the corresponding equation possesses one or more solutions 
in x, let a; be the minimum of these solutions. If the first r of these equations 
(3) have no solutions, let 
a; = 0 
If the z**, say, of the equations 


(4) ba) = *—* (i =1,---,n;0<2<1) 


possesses one or more solutions in z, let b; be the maximum of these. If the 
last n — s of the equations (4) have no solutions, let 


b= 1 G@=s+1,---,n). 
Obviously 
a: S Gin, b< Diss, a; < Dj. 


From restrictions, (e) and (f) on l(z) and k(x), it follows that a, = 0, 
b, = 1. 


Suppose the sample E = 2,,---,2, has been obtained. Arrange the 2’s 
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in ascending order, thus: 2p, , Zp,,--- ,Xp, Where 2p, < tp, L--- < ap. 
Then necessary and sufficient conditions that (1) hold are: 


(5) a: S tp, Sd, (i = 1,--- yn). 


Let P,(t, At), (kK = 0,1, ---,(m — 1); Gey: < t < buys) be the probability 
that a sample E = 2, --- , x, shall fulfill the following conditions: 

(a) 4% < te S--- S Mey, 

(b) 21, --- , 2% satisfy the first k inequalities (5), 

(c) ¢ < teu S E+ At. 

Let 


. P(t, At) 
ii a te =. 
m= se 
Since f(z) = x, we get easily 
(6) P(t) = 1. 


We shall now develop a recursion formula for P;+:(t). For this purpose let 
us consider the following composite event: The observations 27, --- , 2, satisfy 
the conditions (a), (b), and 


th < ten < t’ + Al’ 
and 
'<meuc St + & 


If desi < t’ < bess, the probability of this event is P,(t’, At’)At. Now 


.  P,(t’, At’)At 


ia om Pe). 
ar Af -dl u(t’) 
At-—0 


P,(t’) is obviously the probability density of the bivariate distribution of 
t’ and t. In order to obtain P,,:(¢) we have to integrate P;(t’) dt’ over the 
region defined by the two inequalities 


r<et 
Ong SUS Dey. 
Hence, omitting the now unnecessary accent, if 


(7) t < dew 
then 


(8) Pint) =[ Prat (k= 0,1, ---,(n—2)), 


and if 
(9) t > bys 
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then 


(10) Pint) = [Prat (k= 0,1,2, ++ (m2). 


ak+1 


Now, to obtain P, we cannot confine ourselves only to cases where x; < 
te < +--+ <n, but have to consider all the n! permutations of the n z’s. Hence 


(11) P=nt / ” P.a(t) dt. 


The fact that there are two forms of the recursion formula corresponding to 
the two possible cases (7) and (9) makes actual calculation very cumbersome 
for n of any considerable size. We shall therefore give an approximation 
formula which is considerably easier to apply to practical calculations. 


4. Computation of Pand P. Let P be the probability that, for a sample of n, 


L(x) > ¢(x) for allz. Let P be the probability that, for a sample of n, g(x) > 
I,(x) for all zx. 


Consider the inequalities 
12 : 
— hk (ij = 1,2, --- , n) 
Let 
P(t, At), rae (n 7: 1);t = x41) 


be the probability that a sample EH = 2, --- , x, of the stochastic variable X 
should fulfill the following conditions: 

(a) MSM < Uk+1 

(b) 21, --- , 2% satisfy the first k inequalities (12) 

(ec) t < Lear St + At. 

Let 
P «(t, At) 

At — 

Then, by an argument like that employed in the preceding section, we obtain 
(14) P o(t) = a 
and the recursion formula 


P(t) = lim 
A4t—0 


(15) Pit) = [Prat 


Let P,(t) be defined formally by (15). Then, in the same way in which we 
obtained (11), we get 


(16) P = n! P,(1). 


In the same manner we shall obtain an expression for P. 
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Let P.(t, At), (k = 0, 1,--- (mn — 1); # < bnz_x) be the probability that a 
sample E = 2, --- , 2, of the stochastic variable X should fulfill the following 
conditions: 

(a) Taz S Set S++ Lan, 

(b) 2n-z41, °°: , Un Satisfy the last k inequalities (13), 

(c) ¢ < tan < t + At. 


Let 
P,(t) = lim Eels Ad 
: aro At 
Then 
(17) Pot) = 1 


and by an argument very similar to that employed above, 
bn—-k 
(18) Pua) = [Pad dt. 
t 
Let P,(t) be defined formally by (18). Then 


(19) P = n! P,(0). 


The P,(t) and P,(t) are polynomials in t. Denote by c; the constant term of 
P(t) and by d; the constant term of (—1)’P,(t). Obviously 


(20) Co = 1 
and 





(22) P@) = 2+ 1 ft 4+ +s tent+ea 
a! (« — 1)! 
(23) P;(t) = —o'(% (2PM 4 thes is), 
a! (¢— 1)! 


Since 






we obtain 
i i-1 
(24) ata gaat tea +o = 0 (§ = 1,2, ---,2) 


and 


d z on 
(25) = bn—i41 + aay) e144 + Rae + di-1bnis1 + d; = 0 






(2 = 1.2, ---,n) 
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The determinant of (20) and the first 7 equations (24) (j = 1, - 


sidered as equations in @, ¢1, - 


1il 


-- ,m) con- 
- ,¢; equals 1, since all the elements of the 


principal diagonal are 1 and all the elements above the principal diagonal are 0. 


Then 
| J 0 0 
| Q 1 0 
| as 
=} Oy ae 1 
| ai Pe ti 
(26) |}a! @—1)! @—2)! 


From (16) and (22) for 7 = n, we get 


P=o+7ay + n(n — 1)ee+-::- 





| n! 
| n! 
| a1 
(27) | 
=| ae 
2! 
| a” 
nt 
In the same way, we obtain 
1 0 0 
| bn 1 0 
| Bhs 
| Fete eeees teteeeees veteee 
Dnint  Dnmigs nian 
(28) | & €-t @—~ Fi 





n! n! 
(n—1)! (n— 2)! 
1 0 
ae 1 
a 
(n—1)! (n— 2)! 

0 41 
0 0 
0 0 

| 

bn 0| 


0 a 
0 0 | 
| 
0 0O| 
| 
‘| 
a; 0 | 
| a 1 0 
| a 
21 a2 1 
| ai a a* 
i! (@—1)! (@—2)! 


+ n(n — 1) -++(8)(2)ena + n! en 


e 3s 


oS 


an 


oe 22. 


© 
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; ap 
baits bn—i41 
la! G@—D)! 





and from (19) and (23) forz = n, 
(29) P = (-—1)"n!d,. 
Perhaps if the determinants in (27) and (28) were to be simplified it might 
be easier to calculate P and P that way than by the recursion formulas. 
5. The approximation of P. Let J be the probability that, for a sample of n, 
there exists at least one pair of numbers a; , we , such that 
0<w <1 (¢ = 1, 2) 
ew1) > hw) 
y(we) < l2(we). 
Recalling the definitions of P, P, and P, it is obvious that 
(30) 1-—-P=(1-P)+(1-P)-J. 
Now if 
(31) 440-7 - 


and (1 — P) is small, the right member of (30) with J omitted furnishes an 
excellent approximation to (1 — P). Suppose, for example, that it were 
desired to give upper and lower limits J,(x) and l2(x) such that P = .95. Choose 
l(x) and 1.(x) so that, for example, P = P = .975. Then P cannot differ 
from .95 by more than .000625. Even if 


(32) J < K(1 — P)(1 — P) 


where K is a small factor, say 10, the approximation would still be excellent. It 
seems very plausible that even (31) holds. However, we have not yet suc- 
ceeded in obtaining a rigorous proof. 


6. The construction of confidence limits. We now proceed to the construction 
of lz,.(z) and lg,.(x) which were defined in Section I of this paper. 

A confidence coefficient a(0 < a < 1) is selected to which it is desired that 
the confidence limits correspond. Functions 6,(x) and 6:(x%) are chosen to be 
as defined in Section 2 and also to be such as to make P = a. This can be 
done by application of the formulas for the evaluation of P. 

The functions lz,.(z) and lz,2(z) are to be known when E and a are known. 
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Since a is given, lz,.(x) and lz,.(x) depend upon the outcome of the experiment 
which yields observed vaiues of the stochastic variable X. Let E = 
t1,-°**,%n be this system of values and let g(x) be its distribution function. 
Consider the equations 


(33) 5e(y(x) + Ai(x)) = Ai(z) 
(34) Bi(p(x) — Ac(x)) = Ac(z). 


For a fixed but arbitrary x, —«» <x < +, ¢(z) is known and (33) and 
(34) are equations in A;(x) and A,(x). If, for a certain x, (33) has one or more 
solutions, let «:(a) be the maximum of the set of solutions (for this xz, of course). 
Similarly, if for a certain x, (34) has one or more solutions, let €2(2) be the maxi- 
mum of the set of solutions. 

We can now give lz,.(x) and lz,.(x) as follows: 

For an z such that (33) has at least one solution, 


(35) le,a(z) = o(z) + a(z). 
For an x such that (33) has no solutions, 

(36) le a(x) = 1. 
For an x such that (34) has at least one solution, 

(37) le,a(t) = g(x) — (zx). 
For an z such that (34) has no solution, 

(38) le,a(x) = 0. 


We recapitulate briefly the meaning of lz,.(x) and lz,.(z) which were defined 
in Section 1. These are two functions defined for —«2o < x < +o which 
may be constructed as above after a confidence coefficient a has been assigned 
and after the outcome of the physical experiment which determines the sto- 
chastic point FE is known. These functions have the following property: No 
matter what the distribution function f(x) of each of n stochastic independent 
variables X,, --- , X, may be, provided only that f(x) is continuous and the 
same for each X,, --- , X,, the probability is exactly a that, if we were to 
perform the physical experiment which gives a set of values E of the stochastic 
system X,, --- , X, and were then to construct lz,.(x) and lz,a(x), the inequality 


(39) le,a(t) < f(x) < le,a(2) 


would hold for all z. 

A less precise but more intuitive statement of the above result is as follows: 
If, in many experiments we were to proceed as above to construct lz,.(z) and 
le,2(z) and then, in each instance, we were to predict that the unknown f(z) 
(which need not be the same in all experiments) satisfies (39), the relative 
frequency of correct predictions would be a. 
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The formal proof of this result is exceedingly simple. For any continuous 
f(x), the probability is a that 
(40) (2) < oz) < Liz) 


will hold for all z. This is so because of the way in which 6,(x) and 62(x) were 
chosen. To prove the required result it would therefore be sufficient to show 
that, if (39) holds for all x, (40) holds for all 2 and conversely. 

Let x be fixed but arbitrary. We shall show that 
(41) f(z) < lea(z) 
implies 
(42) I(x) < ¢(z) 
and conversely. 

If (33) has no solution, g(x) > (1) > h(x), le,.(xz) = 1, and (41) and (42) 
are trivial. Assume therefore that (33) has at least one solution. For this 
situation, then, we have to show that 
(43) f(x) S o(x) + a(x) 
implies 
(44) lox) < ¢(z) 


and conversely. 
With z and hence ¢(x) and «(x) fixed, consider the equation in 2’: 


(45) I(x’) = ¢(z). 

Since y(xz) < 1,(1), (45) has at least one solution. Let z,, be the maximum of 
these solutions for a fixed x. Then from the definition of «(a) it follows that 
(46) (am) — beam) = a(z), 


Tt / 
or, on account of the definition of z,, , 


(47) fan) = o(x) + a(z). 

Now, if (43) holds, « < 2,, because of (47). Then, from the definition of 27, 
and the fact that 1:(x’) is monotonically non-decreasing (44) follows. 

If (44) holds, then x < z,, (by the definition of x), and the monotonic char- 
acter of l.(x’)). Hence, because of (47), (43) is true. This shows the equiva- 
lence of (43) and (44). 

In a similar manner, it may be shown that 
(48) lz,a(%) < f(z) 
implies 
(49) o(x) < L(z) 
and conversely. This completes the proof. 
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7. Miscellaneous remarks. An expedient way of choosing 6:(x) and 62(z) is 
such that, with c a constant, 


x + 6(x) = min [x + ¢, 1] 
(50) 0O<sz<l 
x — 6e(x) = max [x — ¢, 0]. 


Tables of double entry could be constructed giving the c corresponding to speci- 
fied a and n. With such tables available the construction of confidence limits 
would be quick and simple in practice. In this case, ¢(x) = e(x) = c. 

Another expedient and plausible way of choosing 6;(z) and 62(r) might be 
to choose them so that 


x + 6(r) = min [pz + q, 1] 
(51) 0<2<1 
x — 6o(x) = max [p’x + q’, 0] 


where p, p’, g, and q’ are constants. The actual construction of confidence 
limits could then be handled with dispatch if similar tables were constructed. 


le,e(t) and lIg,.(x) 
are, like g(x), step-functions. The situation may occur where, for x = e, 


lim lea(x) << lim Ig,,.(z). 
( 


(z<e), ze z>e),z—e 


This would give a prediction, corresponding to the confidence coefficient a, 


that f(z) is not continuous. If f(x) is continuous the probability of such a 
situation is 0. 


8. Further problems. Even with a fixed, the functions 6,(z) and 6(z7) may 
be chosen in many ways. Each different choice gives, in general, different 
confidence limits. Which is to be preferred? This very problem also arose in 
the theory of parameter estimation and the testing of hypotheses and gave 
rise to the Neyman-Pearson theory. It would be desirable to develop such a 
theory for the confidence limits discussed in this paper. 

We have treated here only the case where f(x) is continuous. A similar 
theory is needed for the case where f(x) is not continuous. 

It would be of practical value to construct tables such as those described in 
Section 7. The construction of tables could be greatly facilitated if the formulas 
for P or P and P could be simplified so as to render them more practical for 
calculation or else if they were to be replaced by asymptotic expansions. 


9. An example. To illustrate the method we shall consider an example for 
the case of samples of size 6, i.e. n = 6. 
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Let 6:(x) and 62(x%) be given as follows: 





h(x) =d for 0<2<1-d, 
i(7) =1—2 for l1—d<2<l, 
5(x) = x for 0< 2 <d, 
and 
(x) =d for d<a<l. 
Denote by P the probability that 
g(x) < f(x) + Slf()], 
by P the probability that 
g(x) > f(x) — dal f(x)] 
and by P the probability that 
f(x) — dlf(x)] < (x) < fe) + Off). 


g(x) denotes the sample distribution and f(x) denotes the population distri- 
bution. 


Since 62(x) = 6,(1 — x), we obviously have 


P = P. 




















Let us calculate P = P in case d = 3. According to (3) we have 
a, = a2 = a3 = 0, a& = %, a; = i, dg = 5. 


According to (16) 


P = 6!P,(1) 
where 
Py(t) = 1, 
t 
P,(t) = / P,-1(t) dt (k = i, ais 6). 
ak 


Applying this recursion formula we get 


_ = Pi 7” f° 
P,(t) = t; P2(t) ae P(t) = 6? 
- t* 1 

PAO = 9g ~ 37.38 

- t” t 11 

PAO = 199 — 97.38 — 35-07-5 





6 2 


t t 1lt 11 


P(t) = 720. —«28-35 —«38.97.5 «29. 38.5" 
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= 
2592 


Let us now calculate P = Pin cased = 3. We have 


P = P = 61P,(1) = 1 — = 0.967. 


di = a2 = 0, a3 = 3, a, = 3, as = } and a6 Z. 


Applying the recursion formula we get 


o io o ? = ° 1 
Py(t) = 1, P,(t) _ 4 P(t) sa 9’ P;(t) _ 6 


~ 2-30 
4 


“ t t 1 
Pit) = 4 


~ 94.34 «94.38? 

- t° t° t 11 
PO = 19 ~ 98.3 — BB — 3-3-5 

“ t° ° t 1lt 13 
Ps) = 799 — 98-38 — 38.35 — BF-38-8 ~ FF-3H-5: 


J 


PaP eh «1-2 


It is obvious that 
1-P=(1-P)+(1-P)-J, 


where J denotes the probability that g(x) violates both limits. In case d = } 


no g(x) exists which violates both limits, and therefore J = 0. If d = 3, 
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J is not zero but so small that it can be neglected. Hence 
P = 0.934 if d=} 

and 
P = 0.574 if d=4 


P increases monotonically from 0.574 to 0.934 if d increases from } to3. Denote 
by Pz the probability corresponding to d. According to (33)-(38), the con- 
fidence limits corresponding to the probability level P2 are given as follows: 


ler(x) = o(z) + dif p(x) +d <1, 
lz,p,(t) = 1if ox) +d > 1, 


le,p,(z) = (xz) — dif g(r) —d >0 
and 


le.p,(t) = Oif g(x) —d <0. 


Substituting for d the numbers 3 and 4, we get the confidence limits correspond- 
ing to the probability levels 0.934 and 0.574 respectively. The upper and lower 
confidence limits for the population distribution corresponding to the probability 
level 0.574 are represented geometrically in Figure 1 by the upper and lower 
dotted broken lines for a sample of 6 having the values 1, x2, --- 2. The 
sample distribution is represented by the solid broken line. 
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ON THE POWER OF THE L, TEST FOR EQUALITY OF SEVERAL 
VARIANCES 


By Grorce W. Brown 


The criterion L; was obtained by Neyman and Pearson’ for testing the 
statistical hypothesis H,; that k samples, known to be from normal universes, 
are actually from universes with equal variances, where the means are unspeci- 
fied. The test seems to be of importance, when one considers the number of 
experiments which are concerned with the comparison of several types of 
treatments. The experimenter is in most cases interested in the respective 
means, and it is usually assumed, in order to test for significance of the differ- 
ence between sample means, that the variances of the distributions are equal. 
At present, significance tests for justifying this assumption are rarely applied. 
Because of the unsatisfactory status of the problem of testing simultaneously 
for means and variances, the lL; test is appropriate for justifying first the 
assumption of equal variances before testing for the means. 

Neyman and Pearson have treated the sampling distribution of Z, when H, 
is true, and Wilks and Thompson’ have discussed the general distribution of the 
criterion when H; is not true. Here we shall show that the test is unbiassed 
when the number of observations is the same in each sample, and is in general 
unbiassed in the limit, in a certain sense. In addition, values of the power 
function have been computed for a few selected cases, when k is 2, in order to 
exhibit qualitatively the sharpness of the test. 

Let the i-th sample (¢ = 1, 2,---,k) of n,; individuals be denoted by 2; 
and suppose 2; has been drawn at random from a normal population with mean 
m; (unknown) and variance o; = ;: Denote the observations of 2; by 
tir (r = 1, 2,---,ms). Then the criterion L, is expressible’ in terms of the 
observations as follows: 


k 
ni” II (c2)#" 
(1) Li — . i=1 


ni 
2 in ° 4 
where n = Yn; andc; = 7. (x; — #,)*. For convenience we shall let Lj” = X. 


r=1 


1 {1], pp. 461-464. 


2 See [4]. Nayer [3], studied the Type I approximation to the criterion Z; and tabu- 
lated significance limits, etc. 


3 See [1], p. 464. 
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The variables A,c; are independently distributed according to x”-laws with 
n; — 1 degrees of freedom, respectively, hence the joint distribution of the cj, 


when ; is the true value of oj (i = 1, 2, --- , k), is given by 


1 


oi II ‘ (" gs ') 


2 


(2) TT] (Ale? (e280) F746 ded «++ det 


The power function,* which is defined as the probability of rejecting H,, is 
given by P(A < Xo), and is a function of the true values of the parameters 
A,,---, Ax, where Xo is defined so that P(A < \9) = a when H,istrue. Thus 


F(Ay, «++, Ax) = P(A < Ao) 


(3) J IT (Af? (et eA det «+ dk 


- I r fe; 5 lo. i 


Note that when H; is true P(A < Xo) is independent of the actual common 
value of the parameters, because of the homogeneity of X. 

Let us now restrict ourselves to the case in which n; = p, n = kp. (1) 
and (3) become 


, _— pth IIc; 
ais amen {ae =} 


and 


F(A,, Ao, oe Ax) 
(3’) 


ar 3) -424;¢2 
a ; 7 / IT aren 8) be Aics dc? ee de 
meat 


We shall prove the following 

THEOREM: If nm = m =-::- me = p, then F(Ai, Az,---,Ar) = 
F(A, A,---,A). In other oni, the probability of rejecting H, when A, 
is true is wn than or at most equal to the probability of rejecting the hypothesis 
when any alternative is true, that is, the test is unbiassed. It should be noted 
that the statement of the theorem is to hold for each value of Xo. 

It is evident that F(A1, Az, --- , Ax) remains invariant under permutations 
of the arguments, because of the symmetry in the cj of \ and of the integrand 
in (3’). Moreover, by using the homogeneity of \ we obtain the following 
relations 


= p(s Ae.) Ans g) p(y Ae, Ane As 
(4) F(A,, As, Ax) = r(4, ‘2? Asti) =r(,%, Ant, 4) 


4 Defined by Neyman and Pearson, [2], p. 5. 
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Now if we set a; = i. 1,2, --- ,k — 1), we may replace F(A,, Ao, --- , Ax) 


by F(a, a2,---,@e1, 1) = f(a,---,ax%); we must now show that 
fla,+++,@a) = fC, 1,---,1). From (4) we obtain 


7 ad2 @ ak 1 
(4’) F(a, ae, "++, @¢-1,1) = F(,, ova say a +) 
and permuting the arguments we have, finally, 


1 a2 ag 


(5) f(a, ae, s+) Ana) -#(4,9,% 


a, ay a,’ 


Differentiate (5) with respect to a, 
1 1 i 
fila, a2, ee a1) = “a J (= (2 ‘ae, =) 


’ ’ 
1 a tH ay 


1 1 
+ anfe(2,%, ) + eee + aafisa(4,%, )| 


& =...= = 1, obtaining fi(l, 1,---,1) = 
-,1). But fil, 1,---,1) = fx, 1, --- , 1), hence 


(6) 


fii, 1,---,1) = 0; 7=1,2,---,k—-—1. 


Now differentiating (6) with respect to a; and evaluating at a; = 1, we have 
fu(l, 1, as 1) — De fil, 1, a 1), that is, 


fu(l, 1, a 1) — ful, 1, ae 1) — foa(1, 1, a 1) 
— sss — fye_sea(1, 1, ---, 1) = 2 fall, 1, ---, 1), 


hence, by the symmetry of the variables, 


fall, 1, ad 1) — -— j Ju, 1, ks 1), tJ; 


(8) 
fal, j, «4«, 1) = fu(l, l,--:, 1). 


It is easily verified from (8) that the fi;(1, 1, ---,1) are coefficients of a 
definite quadratic form in k — 1 variables. Therefore there is an extremum 
at (1, 1,---,1). It remains to show that fu(1, 1,---,1) > 0 in order to 
establish that the extremum is actually a minimum. 

2 
In (3’) we make the transformation u; = Ay; a=1,---,k-lj;suw= 
Ck 
Ax; , and integrate out the variable u,, since \ is now independent of uz, 
obtaining 
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U ut?) 





k—1 
(9) f(a, a5 *°**, ax) =B I aj? [ = amy du ce dup-1 
: E +2 an 
t=1 

k-1 4p 

IT u: 
(10) \ = ph sl > 0 

E + > x | 
t=1 


where B is some positive constant independent of the a;. From (9) 


— Ul ai?» | lee 1 a K(p —1)m - 
A<Ag 


= 2 = 
; ” 2| 1 + z aus 


i=l 
k-1 
II ys?) 
— —~ dur +> dura 


k—1 4k(p—1) 
[4 + Eas 


The last step involves differentiation under the sign of integration, which is 
certainly justifiable here. 
Now consider J for fixed ue, --- , Ue-1, and variable uw. A < Ao is equivalent 





(11) 






U1 







to the statement as a < 6 where ¢ and @ depend on ue, Us, --- , Uk-1} 
. u . 9 
g,9> 0. The function (wm) = + uy has a maximum at uw = a 





and has no other extrema, hence the equation = 6 has but two posi- 





(e + . uM)! 
tive roots, x; and 22, say. Let rv, > 2,. Then for fixed we, us, --- , Ue 
the region \ < Xo is composed of the uw intervals (0, 21) and (a2, ©). Now 
examining the integrand in (11) we see that it is the partial derivative with 
respect to wu: of the quantity 


k—1 
i = al 
, te? Tate 
a i 14k (p—1) ° 
“fa + . “_ 


This quantity vanishes at 0 and «, hence 


bafta [hare] — Ao ead 
>™ 
= —8. Ui a ace 
(12) fi — i=l G 3 yr dug +++ dupa 
z2 


7 (1 + Zz A: Ui 
1 


where G is some region of positive measure in the space of the variables 














es 
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Us, Us,-::, Ura. Now differentiating in (12) with respect to a; , and setting 
Q = G2 Sees | Qh = a we get 


Hes) Jp — 3 gfe \ 
fu(l, 1, -° )=Bf Tha 2 5 | oes 


(e+ u)Po= |, 


es Mpth) Fr 
in | eon | \ au -<siie 


Qe + us) o-DH 


The first term inside the braces has the value 6*?~” both at 2; and 22, hence 
vanishes when evaluated between those limits, so that 


k—1 
full, 1, «+, 1) = P=) Bf TT ae 
2 das 
ai? +i) gf et) 
eee ctapew 


(18) 

— due - > duzp-1 

z, and 22 are roots of the equation Tn. = 6, hence 2; = 0(¢ + 2;)* and 
1 

= 0(y + 22). Putting these values in the numerators in (13), we have 


full, 1, ae 1) 


, k-1 
is == B / git II ui? {(p + a)" —(g+ a)" } duz +++ duyz-1. 
G 2 


The integrand is positive, since 0, ¢ > 0 and x2 > 2, hence fy,(1, 1, --- ,1) > 0. 
We have shown, then, that the power function has a relative minimum, at 
least, when H, is true. We shall show that the minimum is in fact an ; sheokute 
wlahnaee. 

Consider the integrand in (12). The integrand has the same sign as 


1 jms 
oh? ) ah? 1) 


J, ki \@—-D ~~ 7Y &l \ak@—D* 
(1 + aa + Yaw) ( + aizte + x aus) 


But 2, = 0(1 + 2, + Zu,)* and rz, = (1 + 22 + Zu,)*. Hence the integrand 
has the same sign as 

k—1 k—1 
l+a+ Liu l+m+ Liu 


? 








= 


k—1 a 
1 + aia + Sats 1 + ai22 + x a; U; 


so that the integrand is , Positive or negative accordingly, as (x; — 22) 


k—1 
E + » a;u; — a (1 _ > 1) | is positive or negative. Since 2, < 22, this 
last contin is positive if a > landa;< 


a,, and negative if a, < 1 and 


a; >a,. Hence we conclude that a > Oif a, > landa; < a, and Z <0 
1 1 
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if a, < landa;>a,. By the symmetry in the variables the same is true of 
a, 1.€., x > 0 if a; > 1 and a; = max (a,), and & < Oif a; < land a; = 
min (a;). Nowsuppose (a}, --- , az)  (1,---, 1). Theneither max (a$) > 1 
or min (a;) < 1. Hence the first partials can vanish simultaneously only at 
(1, 1,---,1), so that f can have no other extrema. Therefore f must have 
an absolute minimum at (1, 1, --- ,1). This completes the proof that the L, 
test is unbiassed when nm = no = --- = my. 

It is easily seen that the test is in general biassed when the samples consist 
of different numbers of observations. Consider the case k = 2, with samples 


of 7; and nz observations respectively. In this case we have the single param- 


eter a = =. As in (9) and (10), 
2 


yn 


(14) f(a) = Bai l (1 + au)?*— du 


in $n 
n U 
= (Sea) a 


As before, the equation \ = Xo has but two positive roots, x. > x, > 0, so that, 


as in (12), 
$(n\—1) zy 
os U 
"(a) =a Bab™ + a =| 
J (1+au)*"“ Jey 


' " 4(ny—1) 3(n\—1) 
= Ba 3(n1— RO ee - : : 
a +az)i*" (1+ i | 


Therefore (1) = B| 2 a | 
= Laka (+m) | 


it is evident that f’(1) = 0 if and only if 


Recalling that ; a : ca" i . i+ m)* 


m, = ~. Hence if m, ¥ nz, the power function does not have a minimum at 


a= 1. It can be shown in this case that a minimum does exist at some point, 
and if n — © so that m, = an, then the minimum tends to the point a = 1. 
The proof is omitted, in view of the fact that a general result of a different 
nature will be obtained. 

Before proceeding, we shall establish a lemma which is undoubtedly well 
known. However, on account of the directness of the argument, the proof is 
given here. 

Lemma: If x1, 22, --- , 2, have joint distribution function fr(a1, 2, -++ , Xn) 
such that E(z;) — m; and E{(a; — E(2,))"] — 0, and af y = g(a, x2, +++ , tn) 


as continuous in 21, L2,--- , Ln, at the point (mi, me, --- , mn), then the distri- 
bution of y converges stochastically to the point g(m, mez, --- , Ma). 
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Proof: By Tschebycheff’s Inequality, 


Piln: — B(a)| > Sh < 4 wt — eeoy' 


Let n be large enough so that | E(x;) — m;| < : 37 = 1,2,---,h. Then 


|x; — m;| > 6 implies | x; —E(z,) | > a hence 


P{\| a; = mi | > bj}< ont §2 © Bike, = E(a;))’). 


Let ws denote a cube a side 26 about the point (m, --- , m,), and let x denote 
the point (11, --- , 2). 


Ple ¢u < D Ptlz — ml > 3}, 


Pix Ew) < 3 2 7. E{(x; — E(z,))’I, 


therefore P[x € ws] — 0, that is P[x C w] > 1. Given any interval w, about 


the point y = y(m, m2, --- , m,), there is a cube ws; about (7m, me, --- , mr) 
such that z C wsimpliesy Cw. Plx C ws] < Ply C wi], but Plz C w] 1, 
therefore Ply C w.] > 1. That is, y converges stochastically to the point 
y = g(m, me, --- , mM). 

Referring to (1), we may express \ as a function of k — 1 variables as follows: 


k—1 
n TT ui” 
i=1 
k—1 $n 
Tl ni” [: +2 ws 
i=l i=1 
C 
where uj = 33;7=1,2,---,kK—1. Letn— o, and let n; = an, 2a; = 1. 
Ck 


Then 


k=l 
IT uf 
a ss : 
k 
IT af [1 + > sul 
i=l 
From (2) it is seen that E(u:i) = E 


(ns — 1)(ni + 1) _, 1a 7 (: ay 
(: ) (m~e — 1)(m, + 1) sealants a; a and E(uj) i; On in other 


=) Ayn; —1_ 1 1% 


ay og —— ; and E(u) = 


CR Aim—-1 am-— 
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2 
words, E[(u; — E(u;))*] > 0. Now we apply the lemma, concluding that An, 
that is, ZL, converges stochastically to the quantity 


r = k-1 


t=1 


TABLE I 








(2) (3) 
m = 10, n2 = m, = 20, ne = 20 


a | } | £@ 


05 
09 
31 
65 
. 84 
.93 


—_ 
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TABLE II 


m=12,n.=8 m = 15, nz. = 10 
a f (a) a 


1/10 .90 1/10 
1/5 61 1/5 
1/4 AT 1/4 
1/3 32 1/3 
1/2 .16 1/2 
3/5 ll 3/5 
.07 
.05 5 
.05 
.06 
13 
.30 
45 
60 
.67 
.87 
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r is the ratio of weighted geometric mean to arithmetic mean of the quantities 


1 , . 
tigen hence r = 1 if and only if a, = a = --- = au = 1, 
kl 
2 


otherwise r < 1. Therefore when Hj is true \* converges stochastically to 1, 
2 


otherwise \” converges stochastically to some value less than 1. 
Let us choose dj” so that P(A < \3”) = a when Hj is true. Consider some 
2 


alternative hypothesis Hy 1 " converges stochastically to r < 1. Choose c 
so that r < ¢ <= &. P(r < é) — 0 when Hii 1S true, but P(r < na) —- 


{@) 


legis a 


when H;, is true, thus, for n sufficiently large, f < Aon, that is, t2< < x”. 
Therefore P(A < »¥”) > P(A < ¢3) = P(A < ¢). Now, if H7 is true, 


P(x < ¢) > 1, therefore P(A < »§”) — 1. 

We have shown then, that if n — © so that n; = an, where the a; are fixed, 
while the probability level a remains constant, then the power of the test with respect 
to any alternative hypothesis H 1 tends to unity. It is impossible, of course, to 
have the power function tend to unity uniformly with respect to all alternative 
hypotheses, since the power function is continuous for all n, and since the 
power with respect to H; is constantly a. What we can conclude, however, is 
that for any particular alternative hypothesis, the probability of rejecting Hi 





128 GEORGE W. BROWN 


is greater than a for sufficiently large n.° (We might say, then, that the test 
is asymptotically unbiassed.) Moreover, the fact that the power with respect 
to Hi tends to unity implies that the test becomes sharper with increasing n, 

In order to illustrate the sharpness of the test, values of the power function 
were computed, when k = 2, for the cases ny = ne = 53m = ne = 10; m = 
me = 20; m = 12, nm. = 8; and m = 15, ne = 10. The results are given in 
Tables I and II. The computations were made from (14) and (15) by means 
of Pearson’s Tables of the Incomplete Beta Function. The roots x; and 2-2 of the 
equation A = Xo were determined, for a = .05, by trial and error, making it 
possible to use the tables directly to compute as many values of the power 
function as desired. 

When = 12, nm. = 8, and m = 15, ne = 10, the power functions both 
have minima at approximately a = 1.1, indicating that the bias is certainly 
not serious. When mn; = nz, the power function has the same value at a and 
1/a, in the other cases the values shift slightly. Note that when nm = nz = 20 
the test is fairly delicate. For example, f(3) = .65, that is, if o. = +/3, 
the probability of rejecting H, is .65. In Figure 1, the power functions have been 
plotted against log a, because of the symmetry in the values a and 1/a. The 
curves J; , Jz , Jz, correspond to columns 1, 2, 3 respectively of Table I. Simi- 
larly, curves IJ, , IJ; correspond to columns 1 and 2 of Table II. 
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COMPLETE SIMULTANEOUS FIDUCIAL DISTRIBUTIONS 
By M. 8S. Bartuetr 


1. Introduction. In a recent paper in these Annals, Starkey [13] has made 
some investigation of the distribution’ related to the Behrens-Fisher test of the 
difference between two means from normal populations with unequal variances. 
She does not, however, give any critical discussion of the validity of this pro- 
posed test in the light of criticisms that have been made of it. It may therefore 
be an appropriate opportunity of reviewing the theory of fiducial distributions, 
as I see it, up to the present stage of development,’ and in particular, of referring 
to the idea of complete simultaneous fiducial distributions. In conclusion I 
have made some brief comment on the particular problem at issue, in the light 
of this general theory; and have added a note on the use of approximate tests. 


2. Fiducial Probability. If from a sample denoted symbolically by S a 
statistic 7 is obtained whose chance distribution depends on one unknown 
parameter 6, the distribution of 7 being of the form 


p(T | 6) = f(T, 6) aT, 


and if the values of 7 bear a regular increasing relationship with 6, (for an 
assigned value of the probability integral), then for any particular value T = Ty, 
we may assert that 6 > 6), where 


[ p(T |\%) =1-—e, 


and we shall know that this assertion, in the system of inferences based on the 
above rule, will have an exact and known probability of being wrong, given by e. 

The inference is thus an uncertain one, but the extent of the uncertainty is 
exactly known, and as stressed by Fisher [6], who first introduced this important 
concept of fiducial inferences and fiducial probability, is completely independent 
of any a priorz notion of what value @ is likely to be. 

It might be emphasized, to avoid confusion, that the inference is a deduction 
from the standpoint of logic, and still requires, if applied in practice, the necessity 
of inductive assumptions concerning the applicability of the mathematical 
theory, but its avoidance of any appeal to a priori probability in regard to the 
value of 6 gives it a completely independent status district from the classical 
inverse probability argument, from which it should be distinguished. The 


1 This distribution has also been studied by Sukhatme [14]. 
2 See also the recent expository article by Wilks [16]. 
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interval assigned to @ is to some extent arbitrary, and we can more generally 
choose 4) and 6; such that the fiducial probability of 


4i<59< % 


is equal to 1 — e. While this fiducial probability is a probability in a formal 
mathematical sense, I have suggested [2] that its special meaning in regard to 
the inference on 6 might be emphasized if we distinguished it by a special 
symbol. Since intervals (6;, 4%) can be built up for all values of «, we can 
represent them all by the general distributional expression 


[017 = [ orion, 


which defines the fiducial probability distribution f,(@ | 7). 

From the point of view of mathematical theory T is, so far, any statistic, but 
Fisher restricted the term fiducial probability for those cases where 7’ was a 
sufficient statistic for @, in order that the fiducial inference should be based on a 
sample statistic which could justifiably claim to contain all the information 
on 6 available from the sample. 

The general theory of interval estimation, without this restriction, has been 
subsequently examined by Neyman (e.g. [10]) under the name of the theory of 
confidence intervals. In this general theory there is no particular restriction 
on the number of parameters involved, for it may be possible in the coordinate 
space represented by parameters 6, (for which there are statistics 7',) to define 
a region R(7’,) for which the assertion that the vector parameter @, lies in the 
region R(T.) has a known probability 1 — e of being correct. 

A difficulty, however, in a multi-parameter theory of fiducial distributions 
is that it does not in general seem possible, even when 7’, is a vector statistic 
representing a joint set of sufficient statistics for 6,, to define a simultaneous 
fiducial distribution f,(6, | T.) which will be consistent with one-variate distri- 
butions f,(@ | 7’) relating to one particular parameter 6. For such consistency 
we must have the symbolic integration 


[ 40.\ 70 


over all 0, other than 6 yielding f,(@) as a result. A further discussion of this 
difficulty is given in Sections 4 and 5, after the theory of one-variate fiducial 
distributions has been more completely discussed. 


3. Fiducial Distributions and Properties of Sufficiency. If we now consider 
the extension of one-variate fiducial inferences to the case where other param- 
eters exist but are unknown, we are led to examine the various types of suffi- 
cient statistic which are related to the theory of estimation of one parameter 
when other parameters are unspecified (Bartlett, [1]). By a distribution of 
fiducial type we shall mean a distribution providing at least confidence inter- 
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vals in the sense of Neyman. This distribution will be defined as the fiducial 
distribution for @ if the statistic used (conditional or unconditional) satisfies 
the necessary sufficiency properties given in the paper just referred to (sections 
6, p. 132, and 7, p. 136). This definition is understood to include the possibility 
mentioned in section 7, where 7 | T2(@) is a conditional statistic of the type 
required for any specified value of 6 (7i| 72 denotes 7,, given 72). For 
example, the theoretical statistic | (2 — m)” in normal theory, where = is 
the sample mean and =(z — m)* the sum of squares of deviations from the 
population mean m, is of this form, and since 


p(z | 2(« — m)*) = pid), 


a fiducial distribution for m is obtained from the familiar Student’s ¢-distribution. 
As other developments of fiducial theory we may note (i) its application to 
fiducial inferences on sufficient statistics in unknown samples (this application 
to normal samples has been discussed by more than one writer, see, for example, 
Fisher [8]; I have moreover indicated the general theory underlying such 
applications [3]) (ii) the case of discontinuous or “discrete”? sampling distri- 
butions, for which the theory of exact fiducial distributions breaks down. 

In the latter case, it is only possible to choose an interval for 6, such that 
the chance of our fiducial inference being incorrect is not greater than « (see, 
for example, Clopper and Pearson [5]). This “inexact theory” I have shown [3] 
may also be extended to inferences on sufficient statistics in unknown samples. 
In particular, from the general distribution 


m! ne! (n — r)!r! 


f1,7:\|r) = 
p(n, ra|7) (m — 1r1)!ri! (me — re)! re! n! 





giving the number of ways of assigning r members with some attribute A in 
numbers 7; and r2 to samples S; and S2, sizes n; and nz, we have for n. = 1, 


r 


nN + 1 
mtl-—r 
nN + 1 
Thus if S,; contains 7, members with a certain attribute, such that 
T1 + 1 
< 
m+17* 


we may assert that a new member from the same population will not possess 
the attribute. If 


(re = 1) 
p(r2|r) = 
(re = 0). 


mtl—-ne, 
m+1 asi 


we assert that the new member will possess the attribute. If 7; does not conform 
to either inequality, we cannot, with the limit of error imposed, commit our- 
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selves. The probability that our variable assertion, based on the above rule, 
is wrong, is then not greater than e. (This type of inference may be contrasted 
with the Law of Succession in the theory of inverse probability. In this rather 
degenerate example it is not of course surprising that the nature of the inference 
we can make is not always very profound!) 


4. Simultaneous Fiducial Distributions. It was pointed out in section 2 
that an inference of fiducial type might be made regarding a joint interval con- 
taining unknown parameters 6, , this interval or region being a variable function 
of the (continuous) statistics 7,. Ifa sufficient set of statistics T, (r = 1 --- k) 
exist for the parameters 0, (r = 1 --- k), that is, if we have 


p(S | 6.) = p(T; | 6,)p(S | T:) 
where 7’, denotes the set 7, --- T;, and similarly for 0,; and if we can write 


p(T, | Or) = p(¢r) 


where the distribution of the set of theoretical functions ¢, of 7 and 6, is inde- 
pendent to any further extent of 6,, then we may write also 


fo(9-| Tr) = p(¢r) 


as the simultaneous fiducial distribution of the 6, (cf. Fisher, [8]). This nota- 
tion allows implicitly for the formal transformation from one set of variates 
to another, the last equation meaning that p(¢,) provides the fiducial distri- 
bution of the @, , when it is regarded as a distribution in 6,. For the equations 
to hold, however, the Jacobian of the transformations must not change sign 
anywhere in the sample space, this condition determining both the formal 
identity of the two sides of the equations and also the necessary one-to-one 
relationship between values of 6, and T,. 

It has been shown by Segal [12] that if the sufficient set 7’, exist, the func- 
tions ¢g, also exist. For we may define ¢, by the equations 


gi = [ p(T) 


T2 
ii | »(T2| Ts) 


so that 
P(T:)p(T2 | Ti) --- = dgidge --- (yr, 0 — 1). 
The above theory is also immediately applicable to quasi-sufficient statistics, 


it being merely necessary to consider the appropriate conditional distributions. 


5. Complete Simultaneous Fiducial Distributions. It has been emphasized 
[3] that the simultaneous fiducial distribution f,(6, | T) obtained from a sufli- 
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cient set of statistics must not be interpreted analogously to a simultaneous 
distribution p(7', | 6,). For example, if the set 7, represent the sufficient sta- 
tistics and s” for the unknown mean and variance of a normal population 
we have 


co 


- 2 
plz, s*|m, a”) = o(? —, “) 


saa Sr(m, o | x, s’), 


but this does not imply that a fiducial inference could be made for one unknown 
parameter defined by 6 = m + a by integration of the above fiducial distribu- 
tion after formal change of variable. 

We may, however, in certain cases show that consistency relations are satisfied 
which justify to a much further extent our calling f,(6,| 7,) a simultaneous 
fiducial distribution. Unfortunately this last expression has already been 
appropriated for f,(4, | 7.) in general; we shall therefore call f,(4, | T',) a complete 
simultaneous fiducial distribution if (taking k = 2 for simplicity) 


fr(A ’ 2) = Sr( | 62)fp(82) 
= fr(O2 | 1)fp(A1), 


where the fiducial distributions on the right are known to exist, and their form 
determined, from the theory of one-variate fiducial distributions. For example, 
if we consider again the normal sample, we have 


cS 


E—m 
p(*=7.2)- 
o Cg 


= fr(m | a) f,(o°) 


and also 


= 0 e)e(3) 
Pp VJs Pp 3 
a So(m) f,(o" | m) 


where 2 = Y(x — m)’. 

These relations imply not only that a fiducial region for m and o* can be 
determined from the observed values of ~ and s’, but that in particular, the 
region can be chosen so that (i) it is some section of an area bounded by two 
lines parallel to the m axis (ii) alternatively it is some section of an area bounded 
by two lines parallel to the o” axis. Integration for m and o” respectively then 
implies extending these sections until the whole area bounded by these two 
parallel lines is included in the chosen region. This existence of a complete 
simultaneous fiducial distribution for the two population parameters correspond- 
ing to a normal sample is a special case of the complete fiducial distribution 


the ate GE E28 PRICE SD LOT 
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which exists for the two parameters of location and scaling for a sample from 
any population of the form 


pele, = s(® _ = 


as I have previously pointed out ([2], p. 564).° 


















For let 7, and 7. be any two algebraically independent statistics giving 
information on the two parameters, such that 


p(s | m, CD) = p(T, T2 | C, m, o)p(C) 


where C represents the configuration of the sample (the idea of specifying the 
configuration C was first introduced by Fisher [7]). The above equation is 
always possible, for if x; is the smallest observation, x2 the next smallest and 
so on, let 


1 =a 
T2 
T; 





Lem %1 
(xy —_ T)/T2, (r Po 2). 


Then C = (T,) is independent of m and o, and the quasi-sufficient set 7, 


T. will determine a simultaneous fiducial distribution for m and o, (the Jacobian 


? 


As further necessary conditions for f,(m, o) to be complete, we have the 
relations 


wT, T2\|C, m, o) = wT = m|a, T2, C)p(T2|C, g) 


Sa )o( = Ti—m™ ) 
( T. C)p( T:i—m|C, tT, °°): 


The first of these relations is obvious, and since the first factor in it corresponds 
to the quasi-sufficient statistic T, for m when the configuration C’ = (C, 72) 
is given (o known), we have 


fr(m, o) = fr(m | o)fp(c). 


For the second relation we note that the set 


T,; -—™m Ze « @: , sie 
J 3) , where gi = tone 92 = = is 3s and is always positive }. 












Ti -—m 


T and 7T,-—m 









—m.. 
is inde- 
2 


are algebraically equivalent to the set T; and T,. Moreover, z 








3 Cf. also Pitman [11], who does not, however, consider the point with which I am con- 
cerned in this paper. 
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1 - 
T2 


pendent of o; and if m is known, (71 — m)|C”, where C” = («, 


is a quasi-sufficient statistic for o. Hence 


fo(m, a) _ So(m)fo(o | m), 
which is the relation required. 

The theory of complete simultaneous fiducial distributions may be applied 
to sufficient statistics in unknown samples. In particular, a complete distri- 
bution may be shown to exist for the statistics Z and s; in an unknown 
normal sample S:, or for the statistics Z and s° for the joint sample S of 
which the known sample S; is also a part ((3]; cf. Fisher, [8]). 


6. The Behrens-Fisher Test between two means. Fisher [8] showed that 
by integrating out the simultaneous fiducial distribution f,(m, o°) obtained 
from a normal sample, we obtained either f,(m) or fp(o’). He then suggested 
that such integration was possible for any simultaneous fiducial distribution; 
and hence obtained a distribution apparently appropriate for testing the differ- 
ence between two means from normal populations whose variances were 
unequal. Since I have shown that this integration can be justified for 
fp(m, o”) owing to the complete simultaneous nature of this distribution, it is 
clear that integration in any other problem is so far justified merely by analogy, 
and no statement as to its meaning in general has been given by Fisher. | 

To show more explicitly the extent to which the proposed solution is open to 
criticism, I examined in particular [2] the case where each estimated variance 


had only one degree of freedom. The Behrens-Fisher solution implies a fiducial 
distribution 


_ (81 + 82) dy 
$0) = Teta tv 


where 6 is the difference in population means m, — m2, Y = (m — 21) — (me 
— 22), where x; and z2 are sample means with estimated variances sj and s3 
each based on only one degree of freedom. By direct argument, I derived a 
distribution of fiducial type 


fo(8) = | si + so| dy 


m{(si + &)? + P} 


where the sign + or, — is to be decided at random. It is irrelevant to my argu- 
ment whether we are justified in calling this distribution the fiducial distribution 
of 5; it is also irrelevant what distribution would ensue if the + and — signs 
were considered separately. It is sufficient to note that the distribution cer- 
tainly provides us with an exact inference of fiducial type, as Fisher himself 
confirmed ({9], p. 375); and this inference clashes with the apparent inference 
to be drawn from the Behrens-Fisher solution. In general it is of course true 
that different distributions might validly lead to different inferences of fiducial 
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type, but here the distributions are sufficiently similar mathematically for it 
to be possible to assert that they cannot both be correct. The direct distri- 
bution of ¥/(si: + se) is in fact known to be dependent on the unknown ratio ¢ 
of the population variances (Fisher, [9], p. 374). While Fisher suggests that 
this in no way invalidates his fiducial argument, in my view if an inference is 
to be independent of an unknown parameter, it should in particular be inde- 
pendent of it if we imagine that we are being supplied with pairs of samples, 
for all of which the ratio ¢ has the same value. 


7. Approximate Tests. I have shown ([2], p. 565) that while f,(6) in general 
does not appear to exist, we have 


Fr(6, ¢) = fo(6 | ?)fr(¢) 


where 


fr(5|o) = O41 4 


Vo —}(ny+n2+1) 
(1 + ¢)(msi + as 


ay a balla +} 
V (ns? + mse) V 1+¢ 
where m; and ne are the degrees of freedom of s; and s2, and C is a constant. 
For n, = ne, the fiducial limits for 6 (if ¢ were known) were shown to be in- 
sensitive to changes in ¢, as has also been shown by Welch [15] in more detail. 
For 7; # ne this is no longer the case. If we tried to get an approximate solu- 
tion we might consider inserting @ = si/s3 for ¢ in the above distribution; this 
would be equivalent to considering the (direct) distribution of 

Y1— Xe 

- = ee 

(si + 82) 
as a t-distribution with n; + me degrees of freedom. This is therefore a first 
approximation to the true distribution of 7, which has been obtained by Welch 
[15] to a further approximation involving ¢. 

Sometimes it is sufficient in practice if we can assign limits to the true sig- 
nificance level of 7’ in any problem, as was illustrated in my own paper ([2], 
p. 566). A formal proof of the inequality used there is as follows. 

The actual distribution of JT for nm. = nz, = n, say, depends on the integral 

10 = [ cae Oa 
0 (L+7?/n)*** (1+ 6/9)" 
where 
i? = _ 2e(1 + 6) 
(1 + ¢)(@ + ¢) 


and hence the significance level of 7' on the integral 


|T} 
J(¢) = [ I(y) dT. 
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If we write 


~ (0 < 6/p < 1) 
uUu= 
g/é, (l<0/p < @) 


we obtain 


|T| Ae 3e- 1 
J($) = f [ ba +X a Ta 4 aba ra 
that is, 


2 2(1 + ug) 


a? 2 2(u + ¢) 


G+watey “~a+wdte) 


where 


a [ (F(t) + F(e)}u"(1 + ou) du 


where t; = Ai | 7'|, t2 = Ax| 7'|, and F(é) is proportional to the probability 
integral of a ¢ with n degrees of freedom. Since 


a(F(h) + F(t) . f 1 1 be! (1 —u)|T| 
er oe \ro(1 + 03 T?/n)"*t ~ ( + 03 T?/n)"f (1 + a) + oF 


this differential coefficient, from the relations 
(l—ujl—-—g)= 
l+uge2z2uteg, 
Ne Wes 


is never negative for all wu and g(y < 1). Hence J(¢) is a steadily increasing 
function in the range (0, 1) for all values of 7’; or the significance level of T lies 
between its values for g = 0 and ¢ = 1, as previously stated. 

More generally, for n; # nz, the effective number of degrees of freedom for T 
would be expected to lie between ; (m1 < ne) and nm + ne (ef. Welch, [15], p. 360), 
though I have not succeeded in establishing this rigorously by a modification 
of the above proof. 
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ON TESTS OF SIGNIFICANCE IN TIME SERIES 
By G. TINTNER 


The purpose of this note is to give some tests of significance for problems 
connected with time series. H. Wold [1], in a recent book gives an excellent 
theoretical treatment of this subject (without treating, however, the important 
problem of the trend), but he does not give any tests of significance, [2]. These 
have proved extremely important in other fields, especially in biological appli- 
cations. The method used in what follows may be found useful also for other 
problems in time series. 


1. A Test of Significance for the Variances of Differences. The Variate 
Difference Method, (3, 4, 5], starts from the assumption that the time series 
w; (i = 1, 2--- N) consists of two additive parts: A “smooth” part m,;, the 
mathematical expectation of w;, and a random part z;, which we will assume 
to be normally and independently distributed with mean 0 and variance o’. 
Hence we have 


(1) Ww=m+%, 


if we have N items in our series. 
We form the finite differences and get for the difference of order k 


(2) A‘w; = A‘m; + A‘x; e 


But the smooth component or mathematical expectation can be eliminated to 
any desired degree by successive differencing. This would not be true of a 
“zig-zag”? component or a periodic function with small period [6]. It will be 
remembered, that for instance the differences of order k of a polynomial of 
order k are constant and that differences of order (k + 1) and higher are zero. 

QO. Anderson and others who worked in this field have tested the order of the 
difference, say ko, beginning from where the component m; is sufficiently 
eliminated, in the following way. They define the variance o; of the k-th 
difference by 


(3) ie } (a*w)?/((N — k)arCr). 


We note that all variances of the differences beginning from kp must be equal 

to each other, because they will contain only the component z;, if the other 

component m; has been eliminated through taking ko differences. O. Anderson 

and R. Zaykoff, [3, 7], give formulae for the standard errors of the difference 
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between the variances of the differences k and k + 1. These formulae are 
valid only for large samples and suppose a knowledge of the true variance o’. 

We propose a new method for testing the equality of the variances of two 
successive differences in order to find the order of the difference ky beginning 
from which we have 


2 2 2 
(4) Tk» _ Oko+1 = CO ko+2 St cece 


This method is one of selection and it consists in selecting the items to be 
included in the variance of the k-th difference, oj, in such a way that they 
become independent of the items to be included in the calculation of the variance 
of the difference of order k + 1, cii1. Then the ordinary test of significance, 
i.e. the one involved in the analysis of variance as developed by R. A. Fisher [8], 
becomes applicable. 

Let us consider an example. Suppose we want to compare the variance of 
the first differences and of the second differences, in order to test the hypothesis 
that the component m; has already been eliminated in the first difference. But 
the process of forming finite differences has introduced correlations, even if the 
original random elements x; are independently distributed. Each item in the 
series of the first differences will be correlated with the next and the preceding 
item. Each item in the series of second differences will be correlated with the 
two preceding and the two following items of the same series. But each item 
of the series of the first differences will also be correlated with the two pre- 
ceding, the corresponding and the following item of the series of second 
differences. 

We can make a very simple valid comparison in spite of these correlations 
if we sacrifice some of the available information. We can for instance calcu- 
late oj by including only items number 1, 6, 11, 16 etc. of the series of first 
differences. And we calculate o2 by including only items number 3, 8, 13, 18 
etc. of the series of second differences. The two quantities oj and o2 are 
independent and hence can be compared by using either Fisher’s z test, [8], 
or Snedecor’s F table [9]. The variances are 


(5) 1 =>, (atop? / (We 1) C1) and 
(6) o: = 2 (A°wi)’ 7 (fe 2) cs) 


where >,’ and >.” denote summation over the selected items. Other selections 
which are possible are: Items number 2, 7, 12 etc. of the series of first differ- 
ences and items number 4, 9, 14 etc. of the series of second differences. Or 
items number 3, 8, 13 of the series of the first differences and items number 
5, 10, 15 etc. of the series of second differences. Or items number 4, 9, 14 ete. 
of the series of first differences and items number 6, 11, 16 etc. of the series of 
second differences. Finally, items number 5, 10, 15 etc. of the series of first 
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differences and items number 7, 12, 17 etc. of the series of second differences. 
These 5 selections are of course not independent of each other. The com- 
parison can always be made by calculating the variances according to formulae 
(5) and (6) and using either Fisher’s z table, [8], or Snedecor’s F table, [9], 
for (NV — 1)/5 and (N — 2)/5 degrees of freedom. If N is large enough, these 
two numbers will be near enough together in order to use the property of the z 
distribution to become normal for equal degrees of freedom, [8]. Then we can 
assume that z = (log .o; — log .o2 )/2 is normally distributed with mean zero 
and standard error +/5/(N — 2). 

Should the test turn out positive, i.e. if the difference between the variances 
is greater than permitted from the point of view of certain significance levels, 
then we have to compare the variance of the second and the third differences, 
by selecting items in a similar manner and so on. 

The general procedure is as follows: If we want to compare the variance of 
the difference number k and the difference number k + 1, we find that we can 
only use a part of our available series, because we must make a selection in 
order to get two independent estimates. We can make 2k + 3 different selec- 
tions, which are not independent but each give two unbiased, independent 
estimates of the variances of the differences k and k + 1. The selections 
consist in taking items number 7, 7 + (2k + 3), 7 + 2(2k + 3),7 + 3(2k + 3) 
etc. of the series of k-th differences and items numberj +k +1,7 +k +1+ 
(2k +3),j +h +1 +4 2(2k + 3),7 +k + 1 + 3(2k + 38) etc. of the series 
of (k + 1) difference. j is here equal to 1, 2, 3--- 2k + 3, giving 2k + 3 
possible selections for the comparison. 


The variances of the difference number k and k + 1 are calculated according 
to the formulae 


(7) df = Tae? / eS 
(8) oi = Datta /( N —k—1) nas. 


The summations are again taken over the selected items and we can make 
an ordinary analysis of variance with Fisher’s z table or Snedecor’s F table 
entering it for (N — k)/(2k + 3) and (N — k — 1)/(2k + 3) degrees of freedom. 
If N is appreciably large, we can assume the number of degrees of freedom as 
equal and z = (log .o; — log .o¢41)/2 is normally distributed about zero with 
a standard error of ~/(2k + 3)/(N — k — 1). 


2. The Distribution of the Serial Co-variance. A similar method yields the 
distribution of the serial covariance, i.e., the product of a random series with 
itself if lagged by a lag L. We assume that z;,7 = 1, --- , N, isa series of N 
terms which are normally and independently distributed with mean zero and 
variance one. We form the serial covariance w by lagging it by L terms and 
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make a selection. We only include the products number 1, 1 + (LZ + 1), 
1+ 2Z1 + 1),1+3(2 +1) ete. The following formulae are exact only if N 


is a multiple of LZ + 1, otherwise they have to be regarded as approximations. 
The serial covariance w is 















(9) W = (Fry. + Try2Xer42 + Ler4sLar4s +--+) /(35) 


We shall use the method of characteristic functions, [10], in order to establish 
the distribution of w. The characteristic function is in our case 


+00 +00 
(10) Ee") = gly) = [ con [ Phil «++ dee 


«o 


where f is the distribution function of the 2;, i.e., a distribution of N normal 
and independent variates with zero means and unit variances. 
An orthogonal transformation of the quadratic form in the exponent yields 


a determinant, which consists of 





N 
[+1 steps each of the form 





1 0 +--+ —a(L + 1)y/N 
0 1 0 (L + 1)°y 
a . —- 
—i(L+1)y/N 0 1 
The characteristic function is therefore given by 
L 1 2. 2-—-4N(L+1) 
(12) oy) =[1+E4Pv | 


and the distribution of w, say D(w), is given, [11], by 











+00 
1 —iwy 


Dw) = = © 9y)dy 


™" Mi be aw re 


7 mn 1) Nw 
Vin T (7* ) Ky y(ny + *') 





where K is a Bessel function of the second kind for a purely imaginary 
argument, [12]. 

We can also get from (12) an asymptotic formula for large N. In this case w 
is distributed normally about zero with a variance of (L + 1)/N. 
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ON AN INTEGRAL EQUATION IN POPULATION ANALYSIS 
By Aurrep J. LoTKa 


I 


A fundamental equation in population analysis rests on the following con- 
siderations: Of the persons born a years ago a certain fraction p(a), ascer- 
tainable for example by means of a life table, survives to age a, and forms the 
a-year-old contingent of the existing population. A similar remark applies 
to every age of life. If, therefore, we denote by N(¢) the number of the popu- 
lation at time ¢, and by B(t) the annual rate of births at the same time, and if 
we are dealing with a closed population, that is, one exempt from immigration 
and emigration, then, evidently, 


















(1) Ni) = l ” deities, 


In general p(a) may be a function of t also, but we shall here consider primarily 
the case where p(a) does not contain ¢ explicitly. 

The function p(a) being known (from a life table), if B(t) is given as a func- 
tion of t, then N(¢) follows by direct integration of the right hand member of (1). 

If, on the contrary, N(t) is given, and B(t) is to be determined, a special 
problem arises. On a former occasion’ I have given a solution for cases in 
which the function N(é) is given or can be expanded in the form of a series pro- 
ceeding in ascending powers of e’, where r is constant; and, more particularly, 
for the case in which N(f) is the logistic function 


May 
1 e77t 





(2) Nit) es sat N.(e" ss ¢"* + g" ets calcd -). 



















Although N(¢) is expanded in an exponential series in the process of obtaining 
the solution by this method, in the final result these terms are reunited, and 
only the original function N(t) as such, together with its derivatives, appears. 
This suggests that it should be possible to obtain the result by a more direct 
route, retaining the function in its original form throughout the process. This 
is indeed the case, as will now be shown, by a method which at the same time 
frees us from the assumption that N(t) can be represented by an exponential 
series in powers of e”’. 

This is accomplished as follows: 


1A. J. Lotka, Proc. Natl. Acad. Sci., 1929, vol. 15, p. 793; Human Biology, 1931, vol. 3, 
p. 459. 
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Let us put 


(3) N(t) = gol(t) 


and assume that B(¢) can be expressed as a series in ¢go(t) and its derivatives, 
thus 


(4) B(t) = cogo(t) — crgilt) + si ga(t) — ri gs(t) + --- 


2 3 
Bt -« = co} eld — agi(t) + 5 20) = = g3(t) + :: + 


gi(t) ee go(t) + 5 ot) = ton : 


a was +} 


where ¢,,(¢) denotes the nth derivative of go(t). 
Introducing (5) in (1), and carrying out the integration, we obtain 


] 
go(t) = comogo(t) — {crime + comgi(t) + 51 {c2™o + 2cim, + come} ygo(t) 
(6) 
{ce3™mo + 3c2m + 3c,mMe + Coms}gs(t) + --- 


where m, denotes the nth moment of the function p(a) about the origin of a, 
that is, 


(7) MN, = | a” p(a) da. 
0 
Equation (6) is satisfied by putting 
b = Com 


(8) ‘ 
C = Come “+ 2cim, + C2™o 


0 = coms + 3c1me2 + 3ce2m, + C3™Mo 


0 = CoM + Ci1™Mo 


the numerical coefficients being those of the corresponding binomial expansion. 
Now consider 


B(r) Sen, ee . —_—_—____—— 1 ne 


9 


—ra ; me 2 
(9) I e “ p(a) da mo — mr + 1 aie aida 


O-crt Set. 
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This gives 
1 = Com 
0 = Com+ Cimo 
0 = Come + 2Cim + Cem 
e = Coms + 3Cime + 3Com + Czmo 


(10) 


from which it is seen that the coefficients c in equation (6) are identical with 
the coefficients C in equation (9), that is, they are the coefficients of successive 
powers of 7 in the expansion of 


1 
(11) B(r) — saa 


I e-"* »(a) da 


as & power series in r. 
These coefficients can also be conveniently expressed in terms of the Thiele 
seminvariants \ of the function p(a), which are defined by 


C) 72 3 
| oo p(a) da F tai a —)3 sT sees 
0 
(12) 


me 2 ms 3 
™ — MT + oT ar” + e 


Differentiating the right hand member of (12) we have 


r pte, atte 
chtte te tate 
mo (mu — rtm — ++ de ™ - 


2 2 
(13) = (m= dar $655 = ++ )(ma — mar + ma — ---) 


2 


r 
=m — Mr + m5 — cee 


=e 4 
= A1Mp 
= im + eM 
ms = Ai1Me + 22m + AszmMo 

Ms = AIMs + 3rA2Me + 3Az™M1 + AgmMo 






again with binomial coefficients. 
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The ’s being thus defined, we now have 


2 
1 1 dire 1 
é . 


| e “p(a) da 
0 


(15) 


Mo 


(16) =@-artas—-: 
from which it follows, as in the case of equations (12), (13), that 


1 


Co ~ 
Mo 


—A1Co 
{ C2 —iC1 A2Co 
C3 = —AiCe — 2Zr2C1 — Azo 
| C4 —)ic3 — 3r2C2 — BAzC1 — A4Co 
lcs = —Dics — 423s — GAsce — 4As€1 — Aso 


once more with binomial coefficients. The coefficients ¢ are, in fact, related 
to the negative seminvariants —\ in the same way as the moments mare related 
to the direct seminvariants. 

Considerable simplification in the coefficients c can be effected by a change 
in origin of ¢t. This is most easily accomplished by reverting to equation (1) 
in which we write, instead of B(t — a), the equivalent expression 


(18) B(t — a) = Bi(t — ) — (a — »s)} = BO — a). 


In place of the moments m of the function p(a), taken about a = 0, there then 
appear in (6) the corresponding moments taken about the mean age \,, and 
in the equation corresponding to (17), for the new coefficients co , c1, 2, «++ the 
seminvariants \ are now defined in terms of these new moments. According 
to a well-known property of the Thiele seminvariants this leaves all the \’s 
except A, unchanged, while reducing this latter to zero. 

The coefficients c’ are therefore obtained by a set of equations identical with 
those for the coefficients c, in which, however, the substitution \; = 0 eliminates 
all terms containing either A; or ¢c; , thus 


—)e2Co 

—A3Co 

—(As — 3A3)e0 
—(As — 102s) co 
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With this choice of constants the solution (4) of the fundamental equation 
(1) finally takes the form 


(20) B(t) = + Sel) _ 25 (0) + g3(0) — ‘a g(0) + vo, 


It is thus seen that if the population, as a function of the time, is represented 
by ¢(t), and expansion by Taylor’s theorem is applicable to g(t — a) within the 
range 0 < a < w where w is the highest age attained by any individual, i.e., 
the highest age for which p(a) has a value other than zero, then the annual 
births B(#) under the régime of a constant life table can be represented by a 
series (20) proceeding in successive derivatives of g. The constant coefficients 
of the successive members of the series are known functions’ of the Thiele semin- 
variants \ of the function p(a), the probability at birth of attaining age a. 


II. ALTERNATIVE SOLUTION 


1. In the special case of a population growing at a constant rate r under the 
régime of a constant life table, the constant birth rate per head is given by 


















(21) a ae 


| e  p(a) da 

0 

This suggests that when r is variable we may still have as a first approximation 
1 

(22) a) = —-—+ —__ = ar) 

e "'*n(a) da 





and that this expression may form the first term in a series expansion of some 
kind. Evidence of this has, indeed, been shown’* in the case of a population 
growing according to the logistic curve, but the formal justification of the 
supposition was not fully established, nor was the law of the series expansion 
determined. We now proceed to establish the series for the general case, using 
as a Starting point the result obtained in Part I. 

We revert, then, to equation (4), and, dividing by N(t), we have 





(23) Bit) os b(t) = — gilt) + C2. p(t) i C3 ga(t) — 
N(t) got) 2eyo(t) 3! ¢go(t) 

? An obvious extension of this result is that this representation of B(t) may still hold 
approximately when the life table is variable, and the seminvariants \ are accordingly 
functions of ¢. We may expect this approximation to be serviceable when p(a, t) changes 
but slowly with ¢, a condition that will usually be satisfied in practice. See A. J. Lotka, 
Human Biology, 1931, vol. 3, p. 481. 
3A. J. Lotka, Proceedings Natl. Acad. Sci., 1929, Vol. 15, p. 796. 
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But en is the rate of increase per head at time ¢, which we have denoted by 71, 
Yo 


that is 
(24) gi(t) = v4 go(t). 


To systematize notation, let us write 7; instead of r; , and denote by rz, 73 - - - 
successive derivatives of 7; with respect to ¢. With this notation the following 
scheme, homogeneous as regards the weight of the terms in the right hand 
member, results 


( vo = 0 
¢1 = Tio 
(25) g2=Tigit rego 
$3 = Tig2 + 2regi + 13 ¢G0 
gs = 1193 + 3reg2 + 3rsgi + Tago 


again with binomial coefficients. 
Eliminating derivatives of ¢ from the right hand members of the set of equa- 


tions (25), we find 
T1 
rit mr 


rit 3rire+ 1: 


ri + Grins + 4rirs + 3r2 + 14 


Introducing the expression (26) for Cm (23), and rearranging terms, we find 
$0 


2 3 
Gari _ cari 
2! 3! 


2 
C. 
+B (a-an at _...) 


2 
- (a —ean+ St +) 


2 2 
4 (4 + 8r2) (c - aan + 40 -) 


b(t) = @ —an+ - 


4! 


0 2 
a (rs oo (« — or a ), 
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It will be seen that the factors rz, 73 , (rs + 372), (rs + 10rers), etc., by which, 
in successive terms, the power series in r; are multiplied, are obtained by the 
formal substitution r,; = 0 in the corresponding expressions (26) for ©" , as for 
Yo 


example, 


0 3 
¥o 8 — r+ Srire +1 
Yo ¥o 


#1 = YT} =| = 73 
Yo Yo_}0 


A = () m4 ri + 6rire + 4rir3 + 3r2 + 1% 
Yo_jo $0 


e= ri + re 2] = 3rs + 14 
0 Yo _j0 


Yo _}0 


With this interpretation of the symbol A , we may therefore write 
¥0_}0 


(29) -53(2| 
Yo _j0 


with the understanding that 


0 
PB) _ g(r.) = plrd. 
ary 
Furthermore, since “° = 1 ana 2 | = 0, equation (29) can also be written 
go Yo_}0 


in the form 


(30) H) = alr) + 3 A | er | 8 


0 ary 


which establishes the desired result, namely, that b(¢) is expressed in terms of a 
fully defined series, in which the first term is 


(22) (rn) = = ———. 
[ &"*p(a) da 


“0 


It will be observed that equation (23) can be written 


b= 2 a2 po | ¢ Po 


or n 
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so that, in view of (29), we have 


: 5 | fee ey 1 a'Ae) le 


n! 0 Po n! ar" Lgo 










a somewhat remarkable relation. 

Analytically, our problem must thus be considered solved, but for purposes 
of computation, as well as on account of a certain analytical interest of their 
own, it is desirable to examine certain properties of the various characteristics 
that appear in the treatment of the problem. 





2. Successive partial derivatives of 8(r). In the application of the formulae 
(29) or (30), it is necessary to obtain successive partial derivatives of B(r) with 
a”"B 

= 
(9), but more exact values are obtained by taking advantage of certain special 
properties of these derivatives. With this in view, it is desirable, first of all, 


to consider certain properties of the moments M, and the seminvariants A, 
of the function 


(31) 
We note that 


respect tor. The values of the derivatives can be computed directly from 













f(r) = &™“p(a). 





(32) M, = | ae “ p(a) da 
aM, aie! -{ n+1 —ra 
os a” e “p(a) da 
(33) = —May. 
Now the seminvariants A of the function e “p(a) are defined by 
M 1 = Ai Mo 






x 
| 


= AiM; + A2Mo 
= Ai Mz + 2A2Mi + AsMo 
= AiM; + 3AeM2 + 3A3Mi + AgMo. 
On the other hand, in view of (33) 


(34) 











M, = Ai Mo 
(35) Mz = AM, — Ai Mo 
M; = Ai Me eal 2A: Mi oe Ai Mo. 





where the primes denote derivatives with respect to r. 
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Hence 


=— Ai 
(36) ’ ” 
= Ae _ Ay 
and generally 


OAn 
( n+ = 
that is, if successive moments are successive negative partial derivatives with 
respect to 7, the same is true of successive seminvariants. 
Furthermore, we have 


“ ae ™ p(a) da 
(38) Br) _ g(r) fo a 


0 
; | e “ p(a) da 
0 


(39) = Ai B(r). 


Hence, in this sense, and denoting successive partial derivatives of 8 by sub- 
scripts 


Bi = Ai Bo 
(40) Be = Ai Bi — AcBo 
Bs = Ai Be — 2AcBi + AsBo 


a set of equations from which successive partial derivatives of B(r) can be ob- 
tained if the seminvariants A are given. 

In actual computation the seminvariants A are calculated according to (34) 
from the moments M, which themselves must first be computed as functions of r. 
If the entire computation is required for only one particular value of r, the 
moments M may be calculated directly by numerical integration of (32). But 
if their values are required for a series of values of r, direct computation would 
be very laborious. Unless r is rather small, merely expanding the exponential 
under the integral sign and integrating term by term is unsatisfactory as the 
series converges too slowly. Much more rapid convergence is secured by ob- 
taining a series development not of the moments themselves, but of the ratio 
between two successive moments, thus: 

r 

Moar _ Mn — TMny2 + gy nts = 
2 

me Mn — TMayi + si Mats -—: 


2 
= Ant — TAne + ai 0s _ 
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where An; is the jth seminvariant of a”p(a), that is, 
Mn = Mn 
May = Ani Mn 
(42) 
Mazz = Ant Mn+1 + An2™Mn 
Maris = Ant Mni2 + 2rn2™Mn41 + An3™Mn 


Furthermore, according to (33) 


aM, _ _ Moss 


M, or M,, 
(43) 


2 
= —(ru — TAn2 + a he = 


Hence 
(44) 


M,, = Mn oo Phar tena ee 

a formula which enables us to compute directly the moments M of e ™“p(a) 
from the moments of p(a) and seminvariants of a"p(a). The seminvariants A 
and the derivatives 8; , B2 --- then follow according to (34) and (40). 


3. Recapitulation. By virtue of the various properties of the moments and 
seminvariants thus developed, the following routine may be followed in the 
computation of the successive derivatives 8,. By direct computation, deter- 
mine the moments m, of p(a). Then obtain in succession the several char- 
acteristics as follows, the numbers over the arrows indicating the pertinent 
equation in the text: 


(42) (44) (34) (40) 
Mn ———— Anj ————>- M,, ————> A, ———> Ban 


4. Numerical example. By way of illustration the results obtained in pre- 
ceding sections were applied to a logistic population for which a series expansion 
of the annual births B(é) in terms of the logistic and its derivatives was avail- 
able from a previous computation‘ carried out by a method less general than 
the one here presented. Of special interest in the numerical results now to be 
shown is the comparison between the two representations, on the one hand B(t) 
in terms of g(t), the logistic in this case, and its derivatives; on the other hand 
b(t) the birth rate per head, in terms of A(7;) and its partial derivatives with 
respect to r. 

The data on which these computations are based are derived from the actual 
growth of the population of the United States, which from 1790 to 1930 followed 
rather closely the logistic function 


* Human Biology, loc. cit., Jl. Soc. Statistique, Paris, 1933, vol. 74, pp. 336, 341. 
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ANNUAL BIRTHS IN LOGISTIC POPULATION 
BASED ON GROWTH CURVE FOR U.S. AND LIFE TABLE 1919/20 


MILLIONS » 
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Total Births arty 
—---— Fundamental Component TTT 
it 2.0 
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ope ey 
TT RT 
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Fie. 1 


197,493,000 N 


me i] 


7 1 + e~-0314(t’—1914) 4] 4 e034 
= N,F(2) 


where ®(¢) is used to distinguish the special case of the logistic function, from 
the general case g(t), and where ¢’ denotes the calendar year. 
This was combined, in the computations, with the life table for white females, 
United States 1 1919- 1920, supposed constant throughout the period.” 

5 This is, of course, an arbitrary assumption made here simply for illustrative purposes. 
The life table for white females was used because of related computations regarding the 
intrinsic rate of natural increase, which have been reported on elsewhere. 
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BIRTH RATE PER HEAD IN LOGISTIC POPULATION 
BASED ON GROWTH CURVE FOR U.S. AND LIFE TABLE 1919/20 
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With this basis, the fundamental data are as follows: 

1. Quantities depending solely on the life table, namely, m,, An;. These 
are exhibited in the first section of Table I. 

2. Quantities depending on the life table and also on r;, namely, M,, An, 


> ,8,n. These are exhibited in the remaining sections of Table I. 
0 
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5. Comparison of the representation (20) of the annual births B(t) and the 
representation (30) of the annual birth rate b(/). It is interesting to make this 
comparison, as applied to the case of the logistic population, for there are certain 
points of marked difference. The graphs Fig. 1 and Fig. 2 show this at a glance. 
In both cases the fundamental component alone yields a very fair approximation 
to the full solution, but the second component is of very different character 
in the two cases. In the composition of B(t) it starts from a vanishing value, 
diminishes through negative values to a minimum, then, passing through zero 
at the “center,” it rises to a maximum positive value, and finally approaches 
zero asymptotically from above. 

The second component of b(t), starting also from a vanishing value, forms 
a single downwardly convex loop, and then approaches zero asymptotically 
from below. 
The higher components in both cases are relatively insignificant. 












Ill. APPENDIX 











1. Symbols used. It may be convenient to assemble together here certain 
of the symbols used in the text: 


mM, = I a” p(a) da = nth moment of p(a) 
0 


Ss 
I 


| a”e ™ p(a) da = nth moment of e ™ p(a) 
0 


Anj = jth seminvariant of a”p(a) 
A; = jth seminvariant of p(a) 
jth seminvariant of e ““p(a) 


1 


> 
- Ss 
I 








ye 
| e ’ p(a) da 
0 
_ 0" 
Bn = — 






a"r 
ot” 


Tin = 


H For definition of this, see equations (25), (26), (28) 
Yo _}0 


2. Derivatives of y(t) and properties of the Logistic Function. In the par- 
ticular case that g(t) is the logistic function #(t), the successive derivatives 
$d, , ,, --- may be obtained step by step by equations (25), (26), taking ad- 
vantage of special properties of that function. 
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1 1 
item * [> ex 
(46) aie ini 

eer" ite 


1 


#(t) + &(-2) = 


Hence, putting 

(47) V(t) = 6(-?) 
We have 

(48) @4+V=1 

(49) vV=1-4 


Denoting the nth derivatives by the subscript n, it follows at once from (49) 
that 


(50) 


= ne / 1 
(51) b, a+ e-Kt)2 i + e-* 


(52) Tr = 
Hence, in the case of the logistic, the algorithm (25) takes the form 
(53) &, = KO) = K4(1 — 4p) 
&, = K{Oo%, + FV} 
—K,{h) — (1 — &)} 
K*bo(1 — &o)(1 — 24») 
b, = K{[by¥. + 26,0, + BY} 
= K* (1 — &)(1 — 6) + 645) 


(55) = K*6(1 — ) (5 + “s - 4) G - A _ 3) 


It is seen that all derivatives vanish at > = 0 and at d) = +1, that is at 
t= +o. Furthermore, 2, ne at By = 4, that is at ¢ = 0; and ®; vanishes 


at By = ; + = , that is at tanh = a VE, since 


(56) (t) = = a =~ ! tanh 7 
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Successive derivatives of can thus be computed successively according to 
(53), (54), (55), ete. For purposes of record, however, it may be convenient 
to note here explicit expressions for these derivatives, and a simple algorithm 
by which the numerical coefficients occurring in them can be written down at 
sight. It is found, by carrying out the differentiation directly, that 









1 
_ rt i eee es 
®, re a + ert)? 
1 — e” 
d = 2 Wee Stee. 
ae (1 + e")3 
1 _ 4e”* + g" 
7 ee Fl « eee 
(5 ) 3 re (i - ert)a 
o _ wert l ae lle” + 11¢”” - P 
4 (1 + ev)5 
o _ ort 1 al 26e"' +4 660" aaa 260°" + Pa 
= . _ 





(1 + ef? 


The numerical coefficients can be obtained by the modification of the Pascal 
triangle shown in Fig. 3. Its use is most easily explained by an example. 
Thus the coefficient —4 in the third line is obtained as the sum of the two im- 
mediately adjoining figures in the line above it, each multiplied by the rank 
of the oblique row in which it appears. This rank is indicated by the cor- 
responding number written above the ruled line forming the “roof” of the 
triangle. Thus the second coefficient in the third horizontal line from above is 
obtained as (1 XK —2) + (—1 X 2) = —4. Similarly, the third coefficient 
in the last line of the diagram (which must be regarded actually as extending 
indefinitely) is the sum of (--57 K —5) + (302 X 3) = 1191. 


+1 =57 +302 -302 +57 =f 
+1 -120 «+119! -2416 +1191 -120 +1 









_ Fic. 3. Scheme for computing numerical coefficients in successive derivatives of log- 
istic function. 







2. Construction of coefficients in (26). The numerical coefficients appearing 
in equation (26) are constructed according to the following rules: 
(a). The expression ¥" contains all possible products of the form rersre --: 
Yo 
the sum of whose subscripts is n, due regard being had to powers of r. Thus 


° 4 . 2 
for example - contains rj , that is ryririr: ; also rire , ris , 72 and 7%. 
0 
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(b). If a, b, ¢ are all different, the coefficient Q.-... of rarsre --- is formed 
according to the following pattern, in which ”’C, denotes, in the customary 


notation, the binomial coefficient (? 


+b+ b+ 
Qade —_ "Cn—(ait+0 Cees CG 


n! 
~ {n—(a+b+0)}!alb!le! 
If some of the subscripts are equal, that is if some of the factors occur as the 
sth power of r, then the formula for Q is modified by the introduction of the 


corresponding factorial s! in the denominator, according to the pattern of the 
following example: 


(58) 


If b = c, so that rersre = Tals 
then the corresponding coefficient is 
1. 
2! 
n! 
(60) ~ 2i{n — (a + 26)} lal (bl)? 
More generally, the coefficient of rirjr’, --- is 


(59) Qa = 


+2b 2b 
Cn—(a+28) P Cy C, 





(uvw) __ nt 
(61) abe = ane(b)(cl --- ulolw! --- 





where 


(62) a>b>c 
and 


(63) au+bv+ewt+---=n 


Formula (59) may be found more convenient than (60) if a table of the bi- 
nomial coefficients is available; for in the case here exhibited for example, formula 


(59) requires only 3 tabular values to be looked up, whereas formula (60) calls 
for 4. 


It may be noted that coefficients of this form occur in certain formulae re- 
lating to seminvariants,’ also in the theory of partitions.’ 


METROPOLITAN LIFE INSURANCE CoMPANY, NEW YORK. 


6 See for example R. Frisch, Sur les semi-invariants et moments employés dans l’ étude 
des distributions statistiques, Oslo, 1926; C. C. Craig, Metron, 1928, vol. VII, p. 10. 

7 Dwyer, Annals of Mathematical Statistics, 1938, vol. 8, p. 21; vol. 9, pp. 4, 8. E. G. 
Olds, Bulletin Am. Math. Soc., 1938, vol. 44, p. 412. H.S. Wall, Bulletin Am. Math. Soc., 
1938, vol. 44, p. 395; P. S. Dwyer, Annals Math. Statistics, 1938, vol. 9, p. 116. E. A. 
Cornish and R. A. Fisher, Revue de l’ Institut Internat. Statist., 1937, vol. 5, p. 307. 





THE INTERPRETATION OF CERTAIN REGRESSION METHODS AND 
THEIR USE IN BIOLOGICAL AND INDUSTRIAL RESEARCH’ 


By C. EIsENHART 


1. Introduction. Just as the scientific theorist depends upon the research 
worker for the facts upon which to build his theory, so does the practical man 
rely upon empirical relationships to help him estimate (or predict) the value of 
one quantity from that of another. Sometimes he is interested in assessing the 
value of some quantity which it is impracticable or impossible to observe directly 
in a given instance, the estimation being performed with the aid of a previously 
established relationship between the quantity whose value is sought and another 
whose value can be determined directly. In other instances he wishes to make 
use of the relationship existing between two or more quantities to heip him 
adopt a course of action which has a good chance of leading him to a desired 
result. An example is that of a manufacturer who wishes to exercise control at 
various stages of a manufacturing process so as to produce a product whose 
quality lies within a specified range. 

In appealing to the interests of the practical man, proponents of statistical 
methods have often illustrated their writings with beautiful examples of the 
power of this implement of research, without adequately discussing the abstract 
ideas that underlie the methods they have promoted—ideas essential to correct 
statistical thinking. The result has been that to many research workers certain 
problems with similar objectives appear amenable to identical statistical solu- 
tion, when in fact intrinsic differences exist which alter considerably the details of 
their solution. 

Such misinformation is particularly prevalent among those whose knowledge 
of the mathematics of correlation, and of curve fitting, has been derived from 
the treatment in elementary statistics courses of problems in which no one of 
the variables stands out from the rest as being the dependent variable, with 
its values determined (not exactly, but within limits) from the values that 
happen to be assumed by the other variables in the data under investigation. In 
elementary courses the usual procedure in such cases is to take one of the variables 
as the dependent variable, and then consider the others as independent variables. 
Furthermore, the curve-fitting procedure usually adopted depends on the addi- 
tional assumption that the values of the independent variables are known exactly 
(without error)—an assumption often passed by without mention, and one that 


1 Revised from an expository paper presented, under different title, to the American 
Statistical Association, at Detroit, December 29, 1938, at the invitation of the program 
committee of the Biometrics Section. 
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introduces artificiality into the analysis and imposes limitations on the range of 
applicability of the inferences drawn. This simplification of problems without 
explicit mention of the fact, fosters misconceptions that are carried over into 
analyses of data in which the dependent variable is definitely a particular one 
of the variables and no other—a particularly bad misconception being that the 
variable whose value is to be estimated automatically assumes the réle of the 
independent variable. The calculation and use of dosage-response curves in 
problems of biological assay constitute an example, and a case which has been 
correctly solved. The dosage-response curve should be evaluated from a series 
of observations, with dosage as the independent variable, and the curve then used 
to estimate unknown dosages from observable responses. 

It is one object of the present paper to pass in review some of the ideas involved 
in current curve-fitting practices so that the reader can see for himself why, 
when one is interested in estimating X from Y, in some instances it isnecessary 
to follow out curve-fitting practices with Y as the dependent variable, and then 
use the inverse of the relation found. In addition, it is an object of this paper to 
indicate the types of problem to which this method of inverse regression affords a 
solution, and to emphasize the confidence interval nature of the estimates it 
provides. The method will be exemplified by working out in detail a problem 
arising in the manufacture of cheese, and also a problem concerned with the 
biological assay of a hormone substance.” 


















2. Mathematical Aspects of the Formulation Of Empirical Relationships. 
Probably the most obvious way of investigating whether any relationship exists 
between two variables is that of plotting the observed pairs of values on graph 
paper. For simplicity we shall confine our attention in this paper to the case of 
only two variables. While the general trend of the plotted points may suggest 
the existence of a relationship, the plotted points themselves do not give a 
definite expression of that relationship, and it is often desirable to have a formula 
of some sort that expresses it concisely. Furthermore, in all branches of science 
the data of the research worker are subject to all sorts of fluctuations which 
tend to make the observational points scatter about a general trend in a band 
not unlike the Milky Way. Consequently various methods have been developed 
for inferring from the observations the ‘true relation’ between the quantities 
concerned, or, more exactly, a relation which it is hoped will be sufficiently 
close to the ‘true relation’ for the purposes in mind. 

In the development of these methods two rather different viewpoints had to 
be taken into consideration: first, that of the physical scientist who views the 
irregular fluctuations as being quite apart from the phenomena under observa- 
tion and arising solely from inaccuracies of measurement and experimental 

















2 Those who are primarily interested in problems of biological assay will find additional 
material in references [26] to [31]; those whose interests lie in the direction of quality 
control are referred to W. A. Shewhart [9], and E. S. Pearson [5]. Numbers in [ ] refer to 
the references at end of the paper. 
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technique; secondly, that of the biological and social scientists who attribute a 
large portion of the apparent irregularity of their observations to a real varia- 
bility which is an essential part of the phenomena studied. That two such 
divergent viewpoints could be brought together on a common ground is a tribute 
to the pioneers in mathematical statistics, and the manner in which it has been 
effected is indicated by the following entry in E. 8. Pearson’s notebook’ for 
1921-22: 

“The purpose of the mathematical theory of statistics is to deal with the 
relationship between 2 or more variable quantities, without assuming that one 
is a single-valued mathematical function of the rest. The statistician does not 
think that a certain x will produce a single-valued y; not a causative relation 
but a correlation. The relationship between z and y will be somewhere within a 
zone and we have to work out the probability that the point (zx, y) will lie in 
different parts of that zone. The physicist is limited and shrinks the zone into a 
line. Our treatment will fit all the vagueness of biology, sociology, ete. A 
very wide science.”’ 

When viewed from this angle, the fundamental problem in the determination 
of a relationship between two variables, say X and Y, is to determine as accur- 
ately as possible from the data in hand the simultaneous probability distribution 
of the observable quantities, say x and y, considered as random variables. There 
is, however, a subtle but important distinction between the cases in which the 
random variability of x and y is due to errors of measurement, etc., and the cases 
in which this random variability is, as in biological variation, a part of the 
phenomena under investigation. In the latter we postulate the existence of a 
probability distribution of the random point (2, y) about some point of location 
(X, Y), where the exact meaning of the codrdinates X and Y depends on the 
nature of the probability distribution, although they will generally be the co- 
ordinates of the mode. In these cases, since (x, y) is subject to biological 
variation only, (x, y) will lie on the line X’=constant only in cases where 
x = X’. Accordingly, along a line x = X’ we shall have the probability distri- 
bution of the random point (X’, y) about some point of location (X’, Tx. 
This may not be true when z is also or alone subject to experimental error, for 
here we postulate a separate probability distribution of (2, y) for each ‘true 
point’ (X, Y), and when there are ‘errors’ in both coérdinates (zx, y) can lie on the 
line x = X’ when X ¥ X’. In these cases, the observed distribution of (2, y) 
for 2 = X’ may result from sampling more than one probability distribution and 
cannot be interpreted simply. If, however, the X-coérdinate is never subject to 
error, the distribution of (x, y) for x = X’ samples the probability distribution of 
(x, y) for (X’, Yx-), where Y x, is the true value of Y for X = X’. Clearly 
similar remarks apply in terms of y and Y. 


3E. S. Pearson, Biometrika, vol. XXIX, parts III and IV, (1938) p. 208, writes: “I 
find on page 1 of my Notes the following statement, which was probably taken down 
fairly closely from (Karl) Pearson’s words: ‘The purpose of... .’ ”’ 
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Actually at the outset it is not customary to embark on the solution of such a 
general problem as the determination of the simultaneous probability distribu- 
tion of rand y. Instead, in the cases where both x and y are subject to ‘error’, 
it is customary to assume that the distributions of x about X, and y about Y, 
are of some particular functional form and then seek to estimate from the data 
the ‘true relation’ g(X, Y) = 0. Likewise, when z and y are subject only to 
biological variation, say, it is customary to seek an estimate of the functional 
relation o(X, Yx) = 0, or of the relation g(X,, Y) = 0, where Yx and Xy 
denote some sort of average (not necessarily the mathematical expectation) of y 
for a given X and of z for a given Y, respectively, the former being interpreted 
as being the ‘true relation’ between X and the average value of y for that X, 
with a similar interpretation for the latter function. Furthermore, in these 
cases of mere biological variation it is customary to take x = X, y = Y, that is, 
to assume that what we observe are the true values of the quantities, that any 
errors of measurement are negligible compared to the sampling fluctuations 
arising from real biological variation. 

So far as I know all methods of utilizing observed values of two variables to 
obtain a relation between the two variables that it is hoped will be sufficiently 
close to the true relation for the purposes in mind involve the following steps: 

(1) To assume that the observational points (a1 , y1), (we, y2), -** , (tw , Yn) 
differ from the points (Xi, Y1), (X2, Ye), ---,(Xw, Yw) as the result of 
observational errors‘ involved in the x, or in the y, or in both coérdinates. 

(2) To assume, either from the general appearance of the graph of the 
plotted points or from theoretical considerations, that the relationship between 
X and Y is of the form g(X, Y; a0, a1, «++ , ax-1) = 0, where gis some definite 
mathematical function involving k, k < N, constants whose values are un- 
known. If it is not assumed that ¢ zs the true functional relation between 
X and Y, then it is assumed that the functional relation specified by the ¢ 
will be adequate for the purposes in mind. 

(3) To choose as an estimate of ¢ the function ¢ = g(X, Y; ao, ai, «++, a1) 
where the a’s are those values of the a’s that render ¢ the function of form ¢ 
which is the best fit to the observed points (7; , yi), (@ = 1, 2,---, N), in 
some sense of the word “‘best”’; and finally, a step which is too often overlooked. 

(4) To carry out some test of goodness of the fit of ¢ to the observed points 
upon the outcome of which rests the decision as to whether a function of the 
form ¢ can adequately describe the observed relation between the 2z’s and 
y’s, and, if the decision be affirmative, accepting ¢ as an estimate of the true 

function of form ¢. 





‘The word “error’’ here should be interpreted as ‘‘experimental or technical error’’ 
from the viewpoint of the physical sciences, (which errors are unbiased in the sense that 
they average out in the long run), and as “biological variation’”’ from the viewpoint of the 
biological scientist. In the latter case, if the biological variation is involved in y, and not 
in z, then xz; = X; and Y; = Yx;; a similar statement holding if z is in error but not y. 
In the former case, X and Y are the ‘‘true values’’ of the variables. 
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In connection with step (4) some results were obtained by W. E. Deming [2] 
for the case where ¢ is fit by the method of least squares. He has found that the 
sum of squared residuals, which is the function to be minimized by the fitting 
procedure, is fairly sensitive to changes in the functional form of ¢, that is, to 
changes which alter its graph within the range of the observations, but much 
less sensitive to changes in the values of the parameters involved in a particular 
functional form. Consequently, by comparing the minimum values of the 
sums of squared residuals for two different functional forms ¢; and ¢g2 under 
tentative consideration, it will often be possible to make a good choice between 
them. On the other hand, it may be possible to alter considerably the values of 
the parameters in the functional form chosen without appreciably altering the 
value of the sum of squared residuals. From this it is seen that ¢ may not be 
well determined by ¢ even though the functional form of g may be the correct 
one for the relationship under investigation. For the case where X is exactly 
known for each observation, with only y subject to error, Deming shows that 
for the same sum of squared residuals ¢ is better determined by ¢ when there is a 
long range in X than when there is a short range. In terms of the measure of 
goodness of fit appropriate to any method of curve fitting these conclusions will 
probably carry over to that method of curve fitting. 

Step (2) also deserves further comment: The function g may be such that ¢ 
fits the data well within the range of x and y studied, but it must be remembered 
that an infinite number of other formulae exist which could be adjusted so as to 
fit the observed points equally well, and some might be found which could be 
made to fit better. Oncea particular functional form for ¢ has been chosen, if $1s 


used to “extrapolate” beyond the range of the observed points, or, if ¢ is used as 
the relation between X and Y in any theoretical considerations, it must be re- 
membered that the soundness of any inference that can be made rests to a large 
extent on the validity of the logic or theoretical considerations that lead to the 
choice of ¢ as the expression of the functional relation between the variables, 
and that the goodness of fit of ¢ for one particular batch of data is not a justifica- 
tion of these extensions. 


3. Some general remarks on curve fitting practices. 
In many cases the assumption is made that a linear relation prevails between 
X and Y, that is, it is assumed that 


(1) ao + mX + aY = 0 

which may be written in the equivalent forms 

(2) Y =a+ BX, wherea = —ap/a2, and B = —a;/az 
(3) X=y7+6Y, wherey = —ao/a,, andéd = —az/a. 


We are adopting for the moment the viewpoint of the physical scientist, and 
assuming that (1) represents the true relation between X and Y. We shall 
return to the case of biological variation later. 
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A common impression on the part of the research worker, regarding the 
principles of curve-fitting, seems to be: If one is interested in estimating Y from 
X, then take Y = a + bX as the estimate of (2), and therefore of (1), the a and 
b being those values which make the line a good fit in terms of the deviations 
(y — Y)—if one were fitting by the method of least squares one would find the 
a and 6 that minimize T(y — Y)2, > denoting summation over the observed 
values of y and their corresponding Y values; on the other hand, if one is in- 
terested in estimating X from Y, then X = b + cY is to be fitted, the values 
of c and d being chosen so as to make X a good fit in terms of the deviations 
(x — X). It does not seem to be generally realized that the fitting should be done in 
terms of the deviations which actually represent “error.””’ Thus when the research 
worker selects the X values in advance, and holds zx to these values without error, 
and then observes the corresponding y values, the errors are in the y values, so that 
even if he is interested in using observed values of Y to estimate X, he should never- 
theless fit Y = a + bX and then use the inverse of this relation to estimate X, i.e. 
X = (Y — a)/b, with the best available estimate of Y substituted for Y. The situ- 
ation is quite clear if one approaches the problem from the point of view of fitting 
the formula to the data with proper attention to which of the variables is in error, 
as has been recognized for a long time by writers on least squares. If both 
variables are in error, then this approach also leads to the appropriate solution.’ 

In order to clarify this point it will be helpful to examine the matter a little 
closer from the viewpoint of the theory of least squares. 

Let us consider the case where the values of X are selected (or adjusted) by 
the research worker and then the corresponding values of Y found by observa- 


tion. So far as the method of least squares is concerned in any given instance 
one could minimize =(y — Y)* and 3(a — X)’, thereby obtaining the two lines 


(4) Y=a+bXx 


A 


(5) X =c+ dy, respectively, 


and, unless there existed a perfect correlation between the observed values of 
X and Y—i.e. unless all of the observed points were exactly collinear, these two 
fitted lines would differ and yield different estimates of (1). There is nothing 
in the method of least squares to help us choose between these, but from the 
viewpoint of the theory of least squares the correct choice in a given instance is 


quite clear.’ The results of the two fitting processes may be given side by side as 
follows: 


5 See, for example, Deming [1]. Deming pays his respects to a paper by Kummel in 
The Analyst (Des Moines) vol. 6 (1879), pp. 97-105; also to a paper by Uhler, J. Optical Soc. 
vol. vii (1923), pp. 1043-1066. 
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=(y — Y)* minimized (2 — X)? minimised 


pa Se -8 « _So-2 
Sz — a) zy — 9° 
a= ¥y — bk =i-—dj 


Analysis of Variance I df Analysis of Variance II 


df. 
(6) Total variability of y’s Total variability of z’s 
about their mean: =(y — 7)” N — 1. about their mean: 2(x — z)*” N —1 


Reduction effected by (4): Reduction effected by (5): 
br(x — Z)(y — 9) d=(x — x)(y — 9) 


Deviations about Y: Deviations about X: 
L(y — 9)” — br(z — aly —-9) S(a — #)? — d&(x — &) (y — 9) 
= X(y — Y)? N -2 = >(2 — X)’ N -2 


In all instances = denotes summation of the expression following it over all the 
observed values; = (1/N) =z, the arithmetic mean of the chosen values of X; 
and 7 = (1/N)Zy the arithmetic mean of the observed values of Y. The expres- 
sion in the middle row of each table of the analysis of variance is an immediate 
consequence of the minimizing process employed; the last row is obtained by 
subtraction. 

Let us now interpret these analysis of variance tables. On the left, 2(y — 9) 
gives a measure of the observed variability of the y values, a portion of this 
variability being due, we suppose, to the dependence of Y on X. The second 
row of table I gives the portion (the maximum portion on the basis of the ob- 
servations) of the observed variability of the y’s that can be attributed to the 
dependence of Y on X, and the last row indicates the magnitude of the rest, 
that is, the magnitude of the portion of =(y — g)” that must be attributed to 
“error” (and, this portion has been minimized by the fitting process). In 
short, remembering that we are dealing with the case in which the values of X are 
chosen by the research worker and only the values of Y are subject to error, the relation 
between X and Y being as in (1) or tts equivalent form (2), we see that the analysis 
of variance table on the left separates =(y — 9)” into portions whose meanings are 
clear. In particular, since unrelated variables can exhibit relationship in finite 
samples, the test of whether 8 is really different from zero resolves itself into 
examining whether the variance ratio 


(=e 90-9) / (74 — 9) — b2@ — Hy —- ) 
i N-2 


is of a magnitude that may be taken to indicate B ¥ 0 in the sense that the risk 
of falsely rejecting the hypothesis that 8 = 0 by so doing is of an acceptable 
smallness. 
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The analysis of variance table on the right, on the other hand, can be misleading 
if it is interpreted hastily. In the first place (2 — 2)’ represents the variability 
in the chosen values of X which resulted from the way in which the research 
worker selected (or adjusted) them, and it is to be noted that the corresponding 
values observed for Y have in no way entered into their determination. Con- 
sequently the apparent dependence of the x on the y, measured by d, or more 
efiectively by the second row of table II, is a spurious dependence, and the last 
row of this table cannot be interpreted as being a measure of the “error” in the z 
values, in the sense of being that portion of the variability of the z values which 
cannot be accounted for by the variability of the y values. Briefly stated, when 
the values of x have been selected by the research worker and the corresponding y values 
observed, the line cbtained by minimizing 2(x — Y)° is meaningless, and (4) is 
accordingly the only correct estimate of the postulated linear relationship between 
X and Y, wherefore, if it is desired to reason from Y to X this must be done by means 
of X = (Y — a)/b, namely (4) solved for X. 

In the preceding paragraphs we have discussed the case where one of the 
variables is subject to random variation, and the other takes only those values 
selected (or, to which it is adjusted) by the research worker. Without loss of 
generality we took Y to be the former variable, and X the latter. Actually we 
have discussed only the case in which (1), or one of its forms, (2) or (3), is as- 
sumed to express the ‘true relation’ between X and Y. That is, we have been 
discussing the case where y varies about Y as a result of experimental ‘error,’ 
and we have not treated the case where y is subject to biological variation. 

If X takes only those values selected by the research worker, and y is subject 
to biological variation but is known without observational error, so that y = Y, 
(1) no longer applies for the reasons given in section 2, but it must be replaced by 


(7) ao a ay X - acY x = 0 
where Yx is the ‘average’ value (but not necessarily the arithmetic mean or 


mathematical expectation) of Y for the value of X denoted by the subscript. 
Clearly (7) may also be written in a form corresponding to (2), 


(8) Yx =a+ BX, with a= —a/a. and B = —a;/as 
or in a form corresponding to (3), 
(9) X= (Yx = a)/B = —a/B+ (1/8)Y x. 


With this latter form we may contrast 


a relation expressing “‘the true average value of X for a given Y” as a linear 
function of Y. Equation (10) is of interest, as well as (8), when X is free to 
vary in samples according to the biological variation associated with it, but when 
the distribution of values of X is dictated by the wishes of the research worker, 
as in the case under discussion, it can be demonstrated that (10) is of no value 
for purposes of inference. 
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The method adopted for estimating (7), or one of its alternative forms, will 
depend on what “average” Yx is taken to be. If, as is usually the case, VY, 
denotes the true arithmetic mean (or mathematical expectation) of Y for a given 
value of X, then (4) fitted by the method of least squares as above affords an 
unbiased estimate of (8). Or, if Yx were taken to be the true median of Y for 
a given X, then in general one would fit (4) by minimizing =| y — Y|, the 
summation being taken over the observed y values. As in the discussion of the 
case involving experimental error, to estimate X from Y one would estimate (9) 
with (4) solved for X, and in a particular instance replace Y by the best available 
estimate of Yx from the data in hand. This brings out the strong similarity 
between statistical procedures appropriate when the variables are subject to 
experimental error and when on the other hand they are subject to biological 
variation but can be accurately observed. 

A great injustice would be done to many previous writers by failure to mention 
at this point that the ideas and the conclusions reached in the preceding para- 
graphs have been appreciated for a long time by some of the writers who have 
developed the theory and applications of curve fitting. At most, the preceding 
paragraphs are but an emphatic way of presenting what these experts would 
regard as obvious. 


4. Effect of Limiting the Range of Either Variable in the Sampling Process. 

In the preceding section we have discussed the situation in which one of the 
variables does not vary at random, but assumes only those values selected by the 
research workers. We have seen that in such cases this variable must be taken 
as the independent variable in applying any curve-fitting procedures. The 
same conclusion applies when both of the variables are subject to biological 
variation but the sampling process limits the observed range of one of the 
variables—only the results obtained by using the restricted variable as inde- 
pendent variable can be expected to give an unbiased description of the under- 
lying relationship in the population sampled. If X is the variable for which the 
range of observable values is constricted by the sampling process, this means 
that the relation (8), for the population sampled, can be estimated from the 
data; but relation (10) for the population is unattainable. 

To illustrate this point it will be sufficient for our purposes to consider Figure 1 
which has been constructed from some artificia! data which are especially suited 
to this purpose. We shall suppose that Y is the dependent variable and X the 
independent variable, and that the complete array of points shown arose from a 
sampling process in which neither X nor Y was restricted. It will be noticed 
that the observational points lie in a band sloping upward to the right and that 
as x increases by one unit the distribution of the corresponding y’s moves up 
by one-half a unit. We may consider the points of the entire band shown as 
portraying the relationship between X and Y in the large, that is, when a point 
(x, y) is selected at random without restrictions on either X or Y. The slanting 
line labelled (I) indicates the “‘average’’ relationship prevailing between Y and 
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X, that is, for a given value of X the arithmetic mean of the corresponding 
observed values of Y is given by the point on this line with abscissa X. 

Let us now consider the situation in which the points have been selected with 
restriction on X. As the results of such a procedure of selection let us take 
only those points between the two vertical lines drawn just to the right of X = 3 


= 


and just to the left of X = 7. It will be seen that this does not upset the average 
y for a given value of x within the prescribed limits, i.e. Yx is unaltered for 
3<X <7. In other words, the introduction of a restriction with regard to X, the 
independent variable, has not spoiled the inferences with regard to Y, when Y is 
considered as the dependent variable—that is, when we are arguing from X to Y. 
Consider now the effect of restricting the observed y in a sampling process 
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and then trying to infer about Y x in the population at large from given values of 
X. In Figure 1 this corresponds to considering, say, only those points that lie 
between the horizontal lines just above Y = 3 and just below Y = 7. Itisseen 
immediately that in this case, i.e., between the horizontal lines, for every value 
of X the average of the observed Y valuesis Y = 5, and consequently the relation 
of Y to X is portrayed by the line numbered (II). Jt is seen that in this case the 
“apparent” relation is not the correct one. Accordingly, we conclude that the 
restriction of the dependent variable is liable to seriously distort the relationship, so 
that what is observed is not representative of the true underlying situation. 

The demonstration that we have chosen is simple and artificial but the conclu- 
sions drawn apply in general, namely, the restriction of X does not alter the 
regression of Y on X, but the restriction of Y does. For further illustrations 
and a very readable discussion see Chapter 19 of Methods of Correlation Analysis 
by Mordecai Ezekiel. 

As a special case of a situation in which the ‘‘observed” y’s are restricted in 
some way or other we may turn the problem around and note the limiting case 
where Y is not a random variable at all but is given certain assigned values by the 
research worker and the corresponding values of X are ascertained by observa- 
tion. It is evident from what has gone before that in such a case any formula 
that expresses the average value of y for a given value of x for the data thus col- 
lected is useless for inferring anything about the average value of Y for a given 
value of X in the “‘population”’ at large. 


5. Variables Subject to Biological Variation and also to Errors of Observation. 
In the preceding paragraphs we have been supposing that the variables were 
subject either to errors of measurement, or to biological variation, but we 
excluded the case in which both types of variation were in operation simul- 
taneously. It is reasonable to suppose that errors of measurement are present in 
biological work just as they are in the physical sciences, though it will usually 
be found that the variability between biological specimens is far greater than the 
maximum variability that could be attributed to errors of measurement. Ac- 
cordingly, in most biological work true biological variations force errors of 
measurement into the background. It is usually possible to check up on this 
by making two or more determinations for each specimen and then comparing 
the variation between determinations with the variation between specimens by 
means of the analysis of variance technique developed by R. A. Fisher [3]. 
When only one determination is made per specimen the two variations cannot be 
distinguished. 

Even if observational errors are in the background, it is of importance to 
know the consequences to be expected when they are superimposed on biological 
variation. Ezekiel discusses this phase of the subject in detail in chapter 19 of 
his book mentioned earlier, and a survey of his conclusions in terms of what we 
have discussed above will be sufficient for our purposes: (a) If Y x denotes the 
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average value of Y corresponding to the X denoted by the subscript (in a certain 
sense of the word “average”’) and is a linear function (8) of X, then if the X 
values are free from errors of measurement but the y values are subject to 
random errors, uncorrelated with the true Y values, and which average out in 
the long run (in the same sense of “average’’ as above), then (4) fitted by the 
method consistent with the meaning of “‘average” provides an unbiased estimate 
of (8), in the sense that its “average’”’ value in repeated sampling will be (8), 
and the effect of the errors of measurement is merely to decrease the precision 
with which (8) can be estimated from the given set of X values; (b) if the situation 
is as in (a) with the exception that the errors are correlated with the true Y 
values, then not only will their presence affect the precision of (4) as an estimate 
of (8), but it will render (4) a biased estimate of (8), the tendency being an 
underestimation of the existing correlation; (c) if random errors affect the inde- 
pendent variable correlated or uncorrelated with its true values, then (4) will 
be an unreliable estimate of (8), and may be markedly biased whether or not the 
errors of measurement affect the dependent variable; and, if non-random errors of 
measurement are present they tend to render (4) a more or less unreliable esti- 
mate of (8), quite regardless of the variables to which they apply. 

The practical significance of these principles in regard to variables subject to 
biological variations is that if large errors of measurement enter into the deter- 
mination of some variable, provided these errors are random that variable may 
still be used as the dependent variable without introducing appreciable bias in 
the estimation equation if enough observations are available to approximately 
balance out the errors; but any use of that variable as the independent variable 
will almost surely yield results that understate the actual relationship, and if the 
errors are not random, they will tend to bias the results quite regardless of the 
variables affected by them. 

6. An Industrial Problem. With the preceding discussion in mind let us now 
direct our attention to a problem that arises in connection with the manufacture 
of cheese. One of the measures of the quality of a cheese is the percent of fat 
it contains. In the cheesemaker’s notation this is given by the fat-drymatter 
ratio, F/DM, which is usually written as percent since the fat is contained in the 
total dry matter. Experience in cheese making has shown that the casein-fat 


ratio, C/F, of the milk out of which the cheese is made influences the F/DM - 


of the finished cheese, and that the relationship is approximately linear, with a 
negative slope, for the range of values of these variables usually studied. 

Since 45% is the lower limit of F/DM for an acceptable cheese as specified by 
law, cheese manufacturers are interested in standardizing the C/F ratio of the 
milk they use, which they can do by separating the milk and cream from indi- 
vidual sources and then putting them together again in proper proportions so 
that the resulting cheese will have a good chance of meeting the legal require- 
ment at least. Figure 2 portrays some results obtained by standardizing the 
C/F ratio at different values, the individual points representing 149 different 
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batches of cheese manufactured in October, 1936 at a particular factory.’ It 
is seen that the relationship prevailing between C/F and F/DM in these data 
takes the form of a rather wide sloping band and not as a close clustering of 
points about a well-defined trend. 

If a cheese manufacturer is able to infer from data of this sort a reliable answer 
to a question like the following, he will be able to improve the economic efficiency 
of his plant: ‘“To what value should C/F be standardized in order that we may 
expect F/DM to exceed 45 in, say, 95% of our future experience?”’ Unfortu- 
nately this type of question, very easy to phrase, is usually exceedingly difficult 
to answer, and, indeed, the very existence of an answer depends on an assump- 
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tion of some sort of stability in the manufacturing process, and in the materials 
used, which enables a future observation to be estimated at least within limits 


6 These data were brought to me by Professor Walter V. Price, of the Department of 
Dairy Industry of the University of Wisconsin, in connection with a different but related 
problem, and I wish to acknowledge my gratitude to him for permission to use them in 
the present discussion. It will be noted that F/DM is given as a per cent, whereas C/F 
is given as a decimal fraction. This is the customary procedure with dairymen, and 
arises from the fact that C/F is merely an index involving two different quantities dis- 
tinguishable in the milk, and cannot be interpreted as a per cent in the same way as the 
F/DM ratio. 
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from available experience. In the succeeding paragraphs we shall present a 
solution that will depend for its applicability upon the following assumptions: 
Let Y denote the true F/DM ratio of a finished cheese, X the true C/F ratio 
in the milk from which it was made, and let Y x denote the true arithmetic mean 
of Y associated with the value of X indicated by the subscript. 
Assumption I: We shall assume that the dependence of Yx on X is linear 
and given by 


(8’) Yx =a+ BX =a’ + B(X — 8), with a’ =a+ pi 


where & denotes the arithmetic mean of the true C/F values corresponding to the 
points shown in Figure 2. 

It should be noted that # and its value do not enter into the specification of 
the linear relationship but only into the alternative expression of it. 

Assumption II: We shall assume that X is determined without error in a 
given instance, and the differences (yx — Yx) between the observed values of 
F/DM, say yx , and their corresponding mean values, Y x , may be regarded as 
drawn independently at random from a population in which (yx — Yx) are 
normally distributed about zero with a variance, oy.x , which is the same for 
all values of X. 

Since these assumptions are restrictive it is necessary in connection with a 
given practical problem to ascertain whether they are acceptable on the avail- 
able evidence before proceeding to the application to the problem in hand of 
methods depending on them for validity. Before applying to a problem of his 
own any of the methods presented in the following paragraphs, the reader 
‘should investigate the tenability of these assumptions with regard to his type 
of data. Methods for examining whether data of a given type exhibit ‘‘sta- 
tistical control’’ are available in the literature and the reader is referred espe- 
cially to the writings of W. A. Shewhart [9, 10]. To date experience has shown 
that it is very difficult to attain and maintain statistical stability in connection 
with industrial processes. On the other hand, it is uselsss to try to answer 
questions of inference such as the above until a fair degree of statistical stability 
is attained, whether statistical processes are employed or not. The success 
along these lines that has been attained in industry is a great tribute to Shewhart 


and his insistence on attention to this phase of the application of statistical- 


methods to practical problems. The sooner workers in other fields turn their 
attention to questions of statistical control, the sooner mathematical statistics 
will be of some value to them. 

From an examination of C/F and F/DM values from the same factory over 
a period of months it appears that although a relation of the type (8’) above 
seems to exist in most instances, it is not stable with regard to the values of a 
and 6. Consequently, unless the source of this instability can be discovered 
and either removed, or allowed for, the answer to the above question is more or 
less unattainable. In order to exemplify the method, however, we shall proceed 
as if statistical control were a fact and assumptions I and II tenable. 
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It is clear, I think, from comments in the early part of this paper that if we 
let Y = F/DM and X = C/F, since the C/F values have been chosen by the 
cheese makers, we shall have to infer about X from the relation of Y to X, 
the latter being considered as the independent variable. Furthermore, it is a 
consequence of assumption II that fitting 


(4’) Y =a+bX =a'+0(X — 4) 


by least squares will provide the most accurate estimates of a and 8 in (8’). 
That a’ = ¥, the arithmetic mean of the observed y values is evident when (4’) 
is compared with (4) and (6). Performing the calculations it was found that 


A 


(11) Y = 64.38 — 24.58X = 43.63 — 24.58 (X — .8439), 


for the data shown in figure 2. 

If now we ask ‘‘What value of C/F will to the best of our knowledge result 
in F/DM = 45 on the average in the future?’’, the answer is obtained by setting 
Y = 45 in (11), solving for X, from which it is found that C/F (= X) should 
be taken equal to (64.38 — 45.00)/24.58 = .7884, and this point is indicated 
by the black dot with white center on the line in Figure 2. We must remember, 
however, that (11) is merely an estimate of (8’), and that the value of Y, namely 
45, obtained by inserting X = .7884 in (11), is merely an estimate of the true 
Y 7983, Which may not be 45 at all. Indeed the use of Y for a particular value 
of X to estimate the true Yx for that X is mathematically equivalent to the 
customary procedure of using g, the mean of all of the observed y to estimate Y, 
the true mean of the Y population. 

In recent years it has become customary to perform such estimations, not 
by single value, but by means of confidence intervals, a confidence interval for Y 
being of the form 

Yi<Y<fh. 

where Y, and Y, are functions of the observed values of Y, i.e. of y1, y2, °°: 
yw, and of the confidence coefficient chosen. If a confidence coefficient, of 
1 — eis adopted (e > 0), then the interpretation of such an inequality is as 
follows: If inequalities of this form are used whenever it is desired to estimate Y 
from the observed y’s, then in the long run we may expect 100-(1 — ¢)% of 
such estimations to be correct, that is, in_100-(1 — ¢«)% of the cases in which 
we apply intervals of form (6) they will include Y within their limits. Such 
limits are sometimes referred to as fiducial limits and the associated degree of 
confidence termed the fiducial probability of the estimation being correct.’ 











7 There is an ever-growing literature on this mode of estimation, and a list of references 
to expository treatments of the subject will be found at the end of the paper together with 
a few other pertinent references. 

From Fisher’s 1935 paper it appears that he wishes to restrict the use of the words 
fiducial probability, fiducial limits, etc. to the cases in which a sufficient statistic exists for 
the parameter to be estimated. Since he introduced the use of these words in this con- 
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We shall now show how to set up confidence intervals for Yx in terms of Y 
for that X, and by an extension of the argument, we shall show how to make a 
probability statement about the difference (y’ — Y) in repeated sampling, 
where y’ is an observation not involved in the evaluation of Y. The connection 
of this type of probability statement to the question asked above will be pointed 
out and its relation to the ideal answer to that question discussed. 

In the succeeding paragraphs we shall make use of the following mathe- 
matical results: 

(A) Assumptions I and II imply that in repeated samples involving the 
same values of X the fitted line Y of (4’) will be normally distributed about 
the true line Yx of (8’) with a variance 


(12) of = og + (X — &)'05 
in which 

o oy.x/N 
oy.x/2(X — 7)’ 


where = denotes summation over the N actual values of X involved, Z is the 
arithmetic mean of these values of X, and o}.x is the true variance of Y for 
a fixed value of X (and assumed independent of X). The condition that the 
sampling be confined to the same values of X is an essential part of the state- 
ment as can be seen from the original argument by Working and Hotelling 
[12] which is outlined by Rider [6]. The result is given by Fisher [3] sec. 26. 
(B) When o}.x is unknown, a convenient estimate from the sample is 


(14) 8.2 = (y — Y)*/(N — 2), 


(18) 


2 
Td 


the distribution of (N — 2) s;../c}.x being as x’ with N — 2 degrees of freedom 
and independent of the distribution of (Y — Yx).” 

(C) Student-Fisher theorem: The ratio of any quantity d normally distributed 
about zero with standard deviation o, to an estimate s having the property 
that ns’/o” is distributed independently of d as x” with n degrees of freedom, 
is itself distributed as Student’s ¢ for n degrees of freedom.® 


Letting S} denote the estimate of of obtained by substituting s).. for oy.x . 


in the quantities (13), it follows from (A)-(C) that 


(15) {= —~——_ 


nection, he has some sort of right to specify their usage. Accordingly Neyman’s confidence 
intervals are of more general availability, and when a sufficient statistic does exist both 
the fiducial limits and the limits of Neyman’s shortest confidence limits (or of his short 
unbiased confidence intervals) will be found to depend on this sufficient statistic, although 
the interval between the limits may not be the same in the two cases, Neyman bringing an 
additional principle into play to assist in the location of his intervals. 

§ Fisher [4]; ‘““Student’’ [11]. 
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is distributed as Student’s ¢ for N — 2 degrees of freedom. Consequently if 
t.os denotes the number for which P{|t| > tos} = .05 where ¢ is as in (15), 
and |¢| denotes the numerical value of ¢, it follows that the probability is .95 
that random variations in the y’s for the values of X chosen will yield a value 
of Y for which 


(16) —tos Sp < Y or Yx < +t. S¢ 
is true, that is, a value of Y for which 
(17) y ow t.os Sp 4 Yx < Y + t.os Sp 


is true. Accordingly, if we assert in a given instance that (17) is true, there is 
no way of telling whether our assertion is correct, but in the long run the Y’s 
we calculate from the data we observe may be expected to differ from their Y x 
values in such manner that (16) will be correct in 95% of our experience, so that 
we may expect to be correct in 95% of the assertions we make about Yx 
using (17). 
For the data of figure 2 the quantities needed in addition to (11) are 
1 1 


= — = 7 A-—f F = 6 7 
N ~ i49 .00671141 D(X — 2) 274796 


Sy.2 = .9448 t.os = 1.979, for 147 degrees of freedom. 


For X = .7884 it is easy to verify that (X — .8439)’ = .0030, and substituting 
in (12) with o;.x replaced by s;., gives Sp = .1290 for X = .7884, and, since 


Y equals 45 for this value of X, we may assert 
(18) 44.744 < V rss, < 45.256, 


and we are correct in this assertion unless a 1 in 20 chance event has occurred. 
Since these limits do not differ widely from 45, we see that we may hazard the 
prediction that, if X = C/F is standardized to .7884, then the values of Y = 
F/DM in our future experience will be distributed about a mean fairly close 
to 45. This prediction is based not only on the assumption that we are sampling 
a stable statistical population, but also on the presumption that (18) zs true. 
Y 7s: may really lie outside and at quite a distance from this interval. The 
results of a sampling experiment which illustrate this point in connection with 
confidence limits for a sample mean will be found in Shewhart [10]. 

Let us now see how the preceding type of argument may be extended to take 
into consideration a single additional y (= F/DM) value. Let y’ denote an 
additional value of Y not included among those used to construct the regression 
Y, and let X’ be the value of X to which y’ corresponds. If y’ be an independent 
observation, then 


(y’-Yx) and (¥’ — Yx), 


where Y’ denotes the value of Y corresponding to X = X’, are normally and 
independently distributed about zero with variances o}.x and of, respectively. 
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Since the difference of two quantities normally and independently distributed 
about zero is also distributed normally about zero with variance equal to the 
sum of the respective variances, it follows that (y’ — Yx) — (Y’ — Yx) = 
(y’ — Y’) is normally distributed about zero with the variance . + oF. 
Using s;., to estimate o;.., which is involved in both of these terms, it follows 
from (C) that 
’ vr 

(19) t= ac , 

V S83, + Sy-z 
where Y’ is the value of (4’) for X = X’ and y’ is an additional value of Y for 
X = X’ and Sz, the value of Sp for X = X’, is distributed as Student’s ¢ for 
N — 2 degrees of freedom. It should be noticed that here the estimate s}., 
obtained in connection with Y carries all of the burden of estimating o}.x. 
Accordingly, unless our combined experience with regard to y’ and Y’ is such as 
would occur 1 time in 20, i.e. unless ¢ of (19) numerically exceeds t.o; for N — 2 
degrees of freedom, it follows that 


(20) —to/s}, +8. <9 — Y' < tos}, + 82. 
which may also be written as 
(21) Y —taVs $a. Sy Si t+toVS 4 oy 


If, therefore, y’ denotes a future observation, unless our experience to date 
(contained in Y and Sy) and our future experience with regard to y’ are such 
as to make ¢ of (19) exceed ¢.o; numerically—it being supposed we are sampling 
a statistically stable universe—then if we predict limits for y’ by means of (21) 
we can associate a confidence of .95 with this combined procedure—that is, if we 
make a habit of evaluating regression lines Y and of predicting new observa- 
tions with their aid by means of (21), then in 95% of the cases in which we take 
independent paired steps of this sort we may expect to be correct with regard to 
our prediction of y’. It should be noted that if Y is “away out” in the first 
place, which may occur by chance, the combined experience of y’ and Y’ will 
probably be “away out’’ too, although y’ may be near Y x where it belongs. 
The 95% wager applies to the combined steps of getting Y and y’ and not to 
the single step laying off an interval about Y in hopes of “catching” y’. In 
consequence one should not keep on using one regression Y over and over 
again, but should be continually amending “‘experience to date” as data accu- 
mulate.” 

It should be noted that the above procedure does not yield us an interval 
which may be expected to include 95% of the future values of y. Such a range 


° H. Working and H. Hotelling discussed this use of regression to forecast future values, 
but did not, as far as I can see, emphasize the confidence interval nature of the argument, 
nor the fact that the probability concerned refers to the two steps involved, and not merely 
to the latter. The same may be said with regard to Schultz’s paper [8]. 
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would bé ai estimate of the range within which 95% of the population values 
lie. The difficulties attending the estimation of this type of range are dis- 
cussed by Shewhart [10], and it appears from his work that in the present state 
of our knowledge very large samples are required for this purpose. In addi- 
tion, by a beautiful example, Shewhart shows how a failure to distinguish 
between confidence intervals associated with a given confidence coefficient, say 
.95, and intervals containing 95% of the population values, can lead to state- 
ments which are quite false. 

Recalling to mind that we have been going through all of this reasoning with 
the aim of finding a way of deciding to what value of C/F (= X) we should 
tell the dairyman to standardize his milk if he wishes to produce cheese for 
which F/DM (= Y) is 45 at least, we see that our problem consists in getting a 
lower limit to y’ where X’ is the value at which we shall advise him to stand- 


ardize. If, therefore, we leave the right side of the inequality (21) open so that 
we have 


(22) Y’ — tos-/ 83, + e.ty¥; 

where to; is the value of t for which P{t < —t';} = .05, the sign of the ¢ value 
in (19) being considered now, then we seek that value of X, which makes the 
left side of this equal to 45. For, if y’ correspond to this value of X, call it X’, 
then unless our experience to date plus our future experience with y’ is such as we 


may expect to occur 1 time in 20 in the long run, y’ will be greater than 45, as 
desired. In other words, we want to solve 


‘gy _ gf (X _ - #) L _ 
(23) a + 0(X’ — 2) tas 4/ ef ++ pit = 2 
for X’, where Q = 45 in this case. By straightforward algebra the general 
solution is found to be 


a) xra ep WOT” 4 Coders 1/pQ— ae + (XH) 0 


in which a = g, B = 1/2(x — 2), and C = Bb’ — (¢%:)(s;.2)(B), and the sign 
before the last term is + if b is positive and — if b is negative. 

From the data involved in the present problem N = 149, ¢ = .8439 

a = 43.63, b = —24.58, B = 3.6391, sj. = .9448, sy.. = .9720 
and for t'9, = 1.656, the one-sided 5% value for 147 degrees of freedom, C = 
594.7479. 

Substituting these values in (24) we find X’ .7207, and this is the value to 
which the dairyman should standardize his C IP ratio. If he does, then unless 
the experience to date, leading to Y of (11), and the future experience with 
regard to any new y (= F/DM) value—unless these combined experiences are 
such as to shove the ¢ of (19) beyond the one-sided 5% value of t for 147 degrees 
of freedom and in the negative direction, the ciliate value of y (= F/DM) 
will be 45 at least. In this sense we may have 95% confidence that our pre- 
diction will be correct. 
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It is clear that the preceding solution can be set up for any desired degree of 
confidence, say 1 — «¢, by choosing t, which is the value of ¢ for which 
P\{t< —t.} = efor the degrees of freedom involved. Furthermore, if an upper 
limit, instead of a lower limit, were desired, the solution would be the same 
except for an interchanged usage of the + and — signs before the last term of 
(24)—for an upper limit one would take a — if b were positive and a + if b were 
negative. For values of Q not too different from 7 it will usually be possible 
to find the solution corresponding to the level of confidence desired. How- 
ever, it is quite possible that a solution may not exist for the value of Q 
desired, if this be too distant from g. This difficulty will arise whenever 
((N + 1)/N](ti)’s}.2B is larger than B(Q — g)* + [(N + 1)/N]b’, in which case 
the radical is imaginary, and no real solution of (24) exists. By graphing the 
left side of (22) for several values of X’ the reason why such cases occur can 
be readily appreciated. 

Since the confidence coefficient in reality relates to the difference (y’ — Y) 
in which both y’ and Y are random variables, when applying this method to a 
particular industrial (or other) problem, one should make repeated Y estimates 
of Yx from time to time in order to insure that the Y used is not away off 
from Yx. As mentioned earlier Y will assess Yx more accurately if the X 
values used are spread over a rather wide range—this follows from the nature 
of (12). By frequent determinations of Y even better estimates of Yx can be 
obtained by pooling the data to date, provided no departures from statistical 
stability are detected. In this way an increasingly reliable estimate of X’ can be 
determined. By standardizing with X = X’ and keeping an eye on the resulting 
y values, one will be able to see whether this choice of X’ is operating satis- 
factorily. Also, and more important probably, by standardizing X = X’ and 
applying control charts as described by Shewhart [9] and Pearson [5] to the 
observed y values, one may detect the first signs of a change in conditions “‘some 
time before this could be discovered by cruder methods, such as mere inspection 
of columns of figures.” 


7. Assaying an Unknown with the Aid of a Previously Established Rela- 
tionship. 

Having come this far, only one step farther is required to obtain a solution 
to a class of problems having the general nature of the following: A previously 
calculated regression, Y, being available, a new value y’ is observed and the 
value of X, say X’, to which it corresponds has been lost sight of, or was never 
known. What value of X should be taken as the best single estimate of X’, 
and within what limits can we assess X’ with a confidence coefficient of .95 say? 

From our previous discussion it is clear, I think, that in repeated sampling 
of both Y and y’ the inequality (21) should hold 95% of the time, if t,o; is the 
value for which P{| t| > to} = .05. Accordingly, unless our present experience 
with regard to Y and y’ is in the upper or in the lower .025 tail of the ¢-distribu- 
tion, y’ is related to Y as indicated by (21). But the left side of (21) is really 
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the same as the left side of (23) with tos in place of ¢'o,, and the right side of 
(21) can likewise be obtained from the left side of (23) by replacing t's by 
—t.os , and in both cases y’ corresponds to Q, X’ being unknown as in the previous 
problem. In short, by setting @ = y’ in (24) and replacing ts by tos , we can 
use this revised (24) to obtain upper and lower limits for X’, and unless our 
combined experience with regard to Y and y’ is such as would occur 1 time 
in 20, the value of X which truly corresponds to y’ will be within these limits. 
; . 
The ‘“‘best”’ single estimate will be X’ = % + y - ye which can be obtained 
from (24) by setting ¢ = 0, and it should be noted that the upper and lower 
limits of X’ for a given confidence level are not symmetrical with respect to 
this value. With regard to the data of Figure 2, if our new value y’ = 465, 
and if the confidence desired were merely .90 (so that we can use tos = t.10), 
the calculations yield .7207 < X’ < .8539 with X’ = .7884 as the best single 
estimate. 

It is unlikely that a dairyman would ever be interested in obtaining limits 
for C/F from the F/DM value of a finished cheese, so that he would probably 
never have any use for this additional technique. On the other hand the pre- 
ceding situation is a common one in connection with problems of biological 
assay where it is desired to evaluate the potency of a substance by comparing 
the response it produces, when administered to one or more animals, with a 
dosage-response relation previously established with dosages of known strength. 
In the preceding problem we considered the case in which y’ was a single addi- 
tional observation corresponding to an unknown X’. If, instead, we had 9’, 
the mean value of N’ additional observations corresponding to an unknown X’, 
it is clear that the denominator of (19) will be ~/§%, + s,../N’ in this case, so 
that confidence limits for X’ corresponding to a confidence coefficient of .95 
will be 


ry _ , by’ — 9) t.05°Sy-2 eps N +N. 
a) wane y TOD tos 4/ Bar - 9 + ( ane )¢ 


and the “best” single estimate of X’ will be 


(26) xraa+E 7, 


where 7 is the mean of the y’s in the analysis of the original N values; 
y “ “* * the additional N’ y’s corresponding to the unknown X’; 
b “ “ regression coefficient in (4’); 
Sy.2 is given by (14) and depends on the scatter of the original y values 
about the regression Y, and is based on N — 2 degrees of freedom; 
B = 1/3(x — &)’, the summation being over the original X values; 
tos = two-sided 5% significance level of ¢ for N’ — 2 degrees of freedom; 
and C = b° = tos°S)-2°B. 
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In practice N’ is usually small compared with N, so that s.. based on the 
original analysis will probably be used. However, if it is desired to make use 
of the dispersion of the new y’ values to “improve” the estimate of oy.x , then 
5. = ((N —- 2)sy.2 + (N’ — 1)s”]/(N + N’ — 3) should be used in place of 
sj.2, Where s” = X(y’ — 9')’/(N’ — 1), and the to; value corresponding to 
(N + N’ — 3) degrees of freedom used. Mathematically this is preferable to 
the above, but involves considerably more calculating, and probably would not 
be used by the practical man. 

We shall illustrate the use of (25) and (26) in connection with the data of 
Figure 3 obtained from autopsies of 69 rats which had received doses of estra- 
diol varying from 0.025 micrograms to .2 micrograms.” It was found that a 
linear relation, with a common variance on the various dosages, existed between 
X = logy dose and Y = +/uterine wt. These are the quantities portrayed in 
Figure 3. The least squares line is 


a 


(27) Y = 6.9023 + 3.4004(X + 1.0777) = 10.567 + 3.400X, 


and is seen to be a good fit. 


Carrying through the necessary calculations we find that 959% confidence 
limits for X’, the true log dose, corresponding to a mean response of j’ based 
on N’ values, are 


X’ = —1.0777 + 0.2964(g’ — 6.9023) 
(28) 





+ .07074 4/ 137609 — 6.9023)? + (Fs x) (.09062) 


and the optimum single estimate is 
(29) X’ = 0.2941 g’ — 3.1077. 


Dr. C. I. Bliss has informed me in correspondence that seldom is the sensi- 
tivity of an animal species to a hormone or other drug constant enough for the 
actual procedure outlined above to be reliable, so that in assaying any given 
sample it should always be tested in parallel with a standard preparation. If 
the slope of the regression, i.e. 6, is fairly stable, even though the position 
changes, it is possible to assay the relative strength of an unknown by admin- 
istering it and a standard at a single dilution, but it is preferable to use at least 
two dilutions in each assay so that it may be discovered whether the new b 
agrees substantially with that given by the standard dosage-response curve. 
Discussions of the procedures to be used in these cases will be found in references 
[26] to [31] from which I have received much inspiration. 


1° These data have been discussed by Lauson, Heller, Golden, and Sevringhaus [32] 
of the Wisconsin General Hospital, to whom I extend thanks for permission to use them 
in the present paper. Only a portion of their data have been used as the linear relation 
discussed below failed outside of the dosage limits given above. 
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8. Concluding Remarks. The formulae and ideas presented in this paper 
have been drawn in the main from the articles and books listed at the end of 
this paper. By turning to these references the reader often will find a fuller 
account of methods and applications than has been given here. In many cases 
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the reader will find that the author of one of the references has placed emphasis 
on getting the answer. In the present paper the emphasis has been on the 
ideas and assumptions involved, the aim being to promote understanding of the 
methods discussed. In particular, the following two points have been stressed 
here: 
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(a) When the values of one of a pair of random variables are selected by 
the research worker, or when one of the variables is allowed to take values 
in only a restricted portion of its real range, then inferences with regard to an 
unknown value of this variable, say X, based on the corresponding (known) 
value of the other variable, say Y, are mathematically valid only when inferred 
from the relationship giving Y as a function of X; and 

(b) The resulting inference is in the form of a confidence interval whose 
confidence coefficient is associated with the joint experience consisting of the 
observed regression of Y on X and the observed (or future) additional sample 
involving the unknown value of X, and not merely with the latter. 

The ideas and assumptions which have been discussed have been illustrated 
on two examples. 

Closer codperation is possible between the practical man and the statistical 
theorist when the latter fully appreciates the problems of the former, and 
when the former, in turn, understands the methods advocated by the latter. 


Tue UNIVERSITY OF WISCONSIN. 
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NOTES 
This section is devoted to brief research and expository articles, notes 
on methodology and other short items. 


(nn ee 


NOTE ON THE J; TEST FOR MANY SAMPLES 
By A. M. Moop 


¥ 1 . P ° 

Neyman and Pearson’ have discussed a method for testing the hypothesis 
that k samples have been drawn from normal populations with the same vari- 
ances by means of a statistical function, LZ, , defined by 


ny 


7 k $2 2 
=I (“*) 
t=1 s* 


‘ ° oy . 
where n; is the number of elements in the é-th sample, s; is the sample variance 
and 


k k 
2 Ne 2 
s = — 8} N= > Nt. 


t=1 t=1 


. 2 ° rT 
For convenience, we shall denote Lj by \. In their paper Neyman and Pearson 
have found the moments of \ and have shown that the distribution of —2 log, \ 


approaches that of x* with k — 1 degrees of freedom when the number of ele- 
ments in each of the k samples becomes large. In some applications of this 
test the question arises as to whether the x” law is a good approximation when 
the number of samples is large in comparison with the number of elements in 
ach sample. For example, in a certain educational study, the number of 
schools was much greater than the number of pupils in each school, and it was 
desired to test for heterogeneity of variances of scores on a given examination 
using L; as the criterion. The purpose of this note is to examine the behavior 
of the L, test for large values of k. 

Wilks has obtained the distribution of \ as a definite integral; it is, however, 
a rather cumbersome form to handle. The procedure here will be simply to 
compare the first few semi-invariants of —2 log \ with those of x°. The p-th 
moment of ) is” 


-. (N-k (wmtDn-1 
vEr(N EB) n(@t Dm) 
(1) . II : | 


u'(p) = 


(pt+tN -—k)\ ea pat (nm —1 
r(® : )' a? 0 (5 ) 


1“On the problem of k Samples,’’ Bulletin de l’ Académie Polonaise des Sciences et des 
Lettres, Série A (1931), pp. 460-481. 
? Ibid., p. 472. 
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Since 


E(e' —— = E(\ “ 


the characteristic function of —2 log \ is obtained on replacing p by — 28 in (1), 
where 6 = it, t being a real variable. The logarithm of the characteristic func- 
tion is the generating function of the semi-invariants; denoting the latter by 
(0), we have 


(2) ¥(6) = log u’(—26). 


After substitution of (1) in (2), the resulting expression can be simplified by 
means of the Weierstrass factored form of 1/I'(2) which is 


l _ we Hh "2 
ia) = * I ¢ + "). r, 


where y is the Euler constant .577. The final result is 


k ™ cia 
y (0) a of net log ni N log v | 4 7 log 2) + N’ k 
(3) ” =  a+N—k 


k 
= r+n,—1 
— 2, D log = 
t=1 r=0 or 
/ 
where N’ = N(1 — 20) and n, = n; (1 — 286). 
The semi-invariants of —2 log \ are given by the derivatives of ¥(6) evaluated 
at @ = 0; these will be denoted by \1, Ax, ---. Ar and dA» are the mean and 
variance respectively, and in general the semi-invariants are related to the 
moments, u., by’ 


- s—] ! 
(4) Ks = ct nv is 


au 3 


From the generating function (3) we obtain: 


k 


1 = W'(0) = DS nm log m — N log N 
t=1 


r=0 4° NTE TL Leon ad 


ii. aan a (2n,)’ _ (2N)* 
er ee wife. (2r + nm, — 1) ee ae | 


s = 2,3, °°: 


3 See e.g., Charles Jordan, Statistique Mathématique, p. 41. 
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The infinite sums can be well approximated by integration when the n; are 
moderately large, giving 





= ni(N —k-—- 1) 
(7) Mi _ 2d Ne log N(m — 2) 
v= s—1 - nt N* ia 
(8) As — (s — 2)!2 »P (m — 2) - WotR | 72% _—" 


and when the samples are of equal size, that is 





mM=m=---=>m=N, N = kn 







equations (7) and (8) become 


k-1 
(9) hi = kn log (1 4+ ts) 
s—1 kn’* kn’ ai a 
(10) As = (s = 2)!2 le oe a s= 2, 3, 


It is worth noting that these last two expressions are monotonic decreasing 
functions of n for a fixed k > 1; hence when the sample sizes are unequal the 
true values of the A, lie between the values given by substituting the least and 
greatest n; for n in (9) and (10). This fact supports the suggestion of Nayer* 
on page 47 of his paper on the application of the Z; test. He has computed 
tables for the critical values of L; when the sample sizes are equal, and suggests 
that when the sizes are unequal but not radically different, the average value 
of n, may be used. 

The limiting values given by 













(11) as (s — 1)!2" "(k — 1) s=1,2,3,--- 


= 





are the semi-invariants of x” with k — 1 degrees of freedom as is easily verified 
by induction using (4) and the following expression for the moments of x’ 
with m degrees of freedom: 


wR tae ts Yk ifatetr. 


u, = m(m + 2)(m + 4) --- (m + 2s — 2). 
For a fixed n > 2 the quantities 








rs 
(s — 1)!2(k — 1) 


are monotonic decreasing functions of k, however the variation is rather slight 
as is evident from the following table: 





‘An investigation into the Application of the Neyman and Pearson L; Test, with 
Tables of Percentage Limits,’’ Statistical Research Memoirs, Vol. I (1936), pp. 38-51. 
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20 | 100 
| 
| 
| 


oO 


10 oo 








1.081 | 1.016 | 1.015 


| 
| 


1.170 | 1.032 | 1.031 


1.048 1.046 





ee RD cf 
48 (k — 1) 1.384 1.369 1.065 1.062 


These results indicate that the degree of approximation of —2 log \ to the x’ 
law with k — 1 degrees of freedom is mainly dependent on n, and is for all prac- 
tical purposes independent of k when n is moderately large. 
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ON TCHEBYCHEFF APPROXIMATION FOR DECREASING FUNCTIONS 
By C. D. Smitx 


The problem of estimating the value of a probability by means of moments 
of a distribution function has been studied by Tchebycheff, Pearson, Camp, 
Meidel, Narumi, Markoff, and others. Approximations without regard to the 
nature of the function have not been very close. However the closeness of the 
approximation has been materially improved by placing certain restrictions on 
the nature of the distribution function.’ For example, when y = f(z) is an 
increasing function from x = 0 to x = co and a decreasing function beyond that 
point, the corresponding probability function y = P, is concave downward 
from x = 0 tox = co and concave upward beyond that point. Here P, is the 
probability that a variate taken at random from the distribution will fall at a 
distance at least as great as x from the origin. Beginning with these conditions 
I have established the inequality’ 

1B. H. Camp, “‘A New Generalization of Tchebycheff’s Statistical Inequality’’, Bul- 
letin of the American Mathematical Society, Vol. 28, (1922), pp. 427-32. 

C. D. Smith, ‘‘On Generalized Tchebycheff Inequalities in Mathematical Statistics,’’ 
The American Journal of Mathematics, Vol. 52, (1930), pp. 109-26. 
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The upper bound was obtained by substituting P,, for P.. ,¢ > c, and the special 
values c = r = 1, andt = 2, gave the result P,, S .092. 

The purpose of this paper is to give an estimate of P., which will substantially 
improve the approximation to the value of P, obtained from (1). Let y = f(z) 
be a monotonic increasing function from z = 0 to x = co and a monotonic de- 
creasing function from x = co to the upper end of the range of z. With P, as 
the probability that a variate taken at random from the distribution will deviate 
from the origin by an amount at least x we know that the curve of y = P, is 
concave downward from z = 0 to x = co and concave upwerd beyond that point. 
When y = f(z) is of finite range the probability curve and the curve of y = f(x) 
will terminate at the same point on the z-axis. The probability curve will 
approach the z-axis when the range of the function is infinite. In either case 
we may take a distribution y = g(x) to follow the curve of the given function 
from x = 0 to x = co and to follow a horizontal line from x = co to a finite 
distance A from the orign and such that the area under the curve is the same 
as that under the curve of y = f(x). Obviously the probability curve for 
y = g(x) will be a straight line from (co, P..) to the point (A, 0), and since the 
curve of y = P, is concave upward beyond (cc, P..) it will remain below a 
straight line to a point very near (A, 0). Also it is evident that the straight 
line has a y-intercept greater than unity since a straight line beginning at (0, y) 
and extending a distance A from the origin would give a probability function 
whose graph follows the straight line from (0, 1) to (A, 0). Obviously the 
ordinates of this probability graph for values of z in the interval from x = 0 
to x = co are less than the corresponding ordinates for the curve which in- 
creases for x in the same interval and then follows the horizontal line. Hence 
a line through points (0, 1) and (cc, P..) is above the line (co, P..) to (A, 0) 
for all points beyond co. 

We may use the line through points (0, 1) and (cc, P..) as a basis for estimating 


se xz + 1 with 2z-intercept 
o 








(1) 


P., in (1). The equation of the line is y = 






i. . The line remains above the curve of y = P. from x = co to a point 







very near the x-intercept and so we may use the line from x = co to the crossing 
point. The range of validity seems to be sufficient for practical use since P.. 
is usually near .9 and cis a fraction. For P.. = .9,c = .5, the intercept of the 
line is approximately 5c. Let the ordinate under the line be y;, (¢ > c), and 


then P., = 1+ ; (y. — 1). For the probability curve y = Pz we have P., > 


| or hts 2 oe ee el ¥ tfaltatr. 
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1+ : (Pic — 1) since yx > Pie. Substitution of 1 + . (Pic — 1) for Peo in 


-@ (1-9 ) 
| t(2r + 1) ; ad 
e ~o(i i ) ta < i_ P,,’ 6 as in (1). 


~ &(2r + 1) 
To indicate the amount of improvement let c = r = 1, and# = 2. From 
(1) Ps, S .092 while from (2) Ps, S .056. One may work from any origin 
other than the mean by letting h = co in (2). 


(1) gives 
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CORRECTION OF SAMPLE MOMENT BIAS DUE TO LACK OF HIGH 
CONTACT AND TO HISTOGRAM GROUPING 


By Dinsmore ALTER 


The first correction of sample moment bias was devised by W. F. Sheppard 
[1]. His method corrects for histogram grouping on the assumption of high 
contact at both ends of the frequency curve. Usually this is a sufficient 
correction. In some cases, however, of J-shaped curves the error remaining 
is even more serious than in the original histogram moments. 

A method developed by E. Pairman and Karl Pearson [2] makes a complete 
correction for both of these sources of bias. The only advantage claimed for 
the method to be developed here over theirs lies in simplicity of mathematical 
theory. 

A third correction is given by Elderton [3]. In his method he assumes that 
there is no error due to histogram grouping and he develops a correction for lack 
of high contact, in so far as the zero-th moment is concerned. The following 
work may be thought of largely as an extension of his method although it will 
have certain variations. 

Let A, and »,, be defined as follows, 


+4 
A; = / 2 Yr+t dt 
t=—3 


, _ 
VnDAz = D2" Az 


The definite integrals are the areas of the histogram rectangles if a scale of x 
be chosen to reduce their width to unity. Let ,, be defined by 


lo 
UmDA, =| x” y, dx 


I; 


In the first equation the x’s form a series of equally spaced constants. In the 
second, x is a continuous variable. The summations are to extend over the 
equally spaced values of zx. 
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If the data form a histogram, J, and I, are respectively the values of z at the 
left edge of the left-hand rectangle and the right edge of the right-hand one. 
If the data are the values of y, at isolated points, J, is the value of x one-half unit 
smaller than the smallest value given in the sample and I, is one-half greater 
than the largest. It would be perfectly satisfactory, of course, to define these 
limits differently. As defined, however, they parallel the histogram case. 
Distributions of this latter type will be called point frequency distributions. 

As is customary, the primed moments denote those about an arbitrary origin. 
Moments corrected for lack of high contact and for grouping will be denoted by 
un OF by um if taken about the mean. Numerical raw moments will be denoted 
by »um- There are two entirely different methods of approach to this bias 
problem. 

(a) The bias may be put into the algebraic form of the frequency curve and 
equated directly to the numerical raw moments. In the case of a point fre- 
quency distribution such a method forms the algebraic values of yz for each 
point given in the sample and, therefore, puts the raw moments into algebraic 
form to be equated to the numerical ones. This is the simplest method of 
correction if the distribution is a power series. For most types the method 
leads into difficulties which complicate it beyond practical use. 

(b) The raw moments given, whether ,u,’s or v,’s can be corrected to ap- 
proximate very closely the desired u,,’s as defined above. 

A point frequency distribution gives ,u, = =2”yz. If there is high contact 
nit, is an unbiased observed estimate of u,,. This second form of method will 
be developed here primarily as a correction to num. 

Only one assumption is involved. Fifth differences of y, will be considered 
as negligible. Any interpolation formula is available but Stirling’s will be 
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Using Stirling’s formula: 
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~320.~S 


m E mx” 23m(m — 1)(m -— - 2)a"" | 





+ Az ja — Da™™* , m(m — 1)(m — 2)(m — ae 


~ 1440 80640 


— av | Bae - i 4 29m(m — 1)(m — 2) (m — =e] 


107520 9289728 
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+ terms involving (m — »} 
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6 + 160 720 ~ 53760 
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17x" 23 mt 23 iv 
- ¢2 + ens) A: — 77920 a.'| 


’ ’ , in x 1). x z\ 
it 2[(F+h)u+(Z +z )a. 


32° | 1 \ nw (172° Se) m — ( 232" 29 ) iv 
a (F * aus) 7 (Fi * 3360)“ — \soe0 + 387072) 47 | 
Ordinarily it will not be necessary to use all of the corrective terms. 


For point frequency distributions the application of these equations is direct. 
The »,,’s may be computed from 


a tal 
iAn_,+?S «Se 
Yet 54 — 5760" 
and the definition of v,,. There is, however, a theoretical difficulty in a case 
for which the data have been given as a histogram. In such a case the values 
of A, are all that have been known originally. The A’s are not the ones de- 
manded by the equation. The’relationship to the proper ones is simple: 


— An 

Ke. OA, ge oe ~ AS, , 

a 5790 (AH — AS), oe 
It is possible to compute the A‘’s from this equation but the discrepancy is 
small and moreover the A”s are used only in corrective terms. Probably the 
error involved by use of the wrong A’”’s is negligible in any actual case of data 
that ever will be studied. In the numerical example to follow, the very slight 

errors remaining in the u,.’s are due, probably, to this neglect. 
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Pairman and Pearson gave a numerical example in which both the lack of 
high contact and the grouping introduced large errors. They started with 
yz = 100,000 4/x and from this formed ten values of Az. From these they 
computed the »,,’s and corrected them to get the u,’s. The exact values of the 
latter were already known to them through integration of the original equation. 
The following table compares four values of moments from these data. 








| u,, by u,, with Pair- Method 
m | Pa Sheppard’s man-Pearson Developed True Values 
| Formula = |Full Corrections) ere 
1 5.9880 | 5.9880 5.9994 | 5.9996 | 6.0000 
2 | 42.6900 | 42 .6067 42.8570 42.8576 42.8571 
3 | 331.0854 329 .5884 333 .3349 333 .3387 | 333 .3333 
4 | 2698.7735 2677 .4576 2727 .2757 2727 .3555 | 2727 .2727 








Despite the use of the A‘,’s instead of Aj’s, the results of this method are 
almost as good as by the older one. The method has the additional advantage 
of unifying the theories of the correction of moments from the two types of 
distribution. 
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FREQUENCY DISTRIBUTION OF PRODUCT AND QUOTIENT 


By E. V. Huntineton 








The main purpose of this note is to establish Theorems 1 and 2. For the 
sake of completeness, the more familiar Theorems 3 and 4 are appended. All 
four of these theorems have numerous applications in the theory of frequency 
distributions. While the proofs of Theorems 1 and 2 in the elementary forms 
here given (and used in my class-room notes since 1934) can hardly be new, they 
seem not to be readily accessible in the current text-books. 

THEOREM 1. Suppose a variable x is distributed in accordance with a probability 


law i f(x)dx = 1; and a variable y in accordance with a probability law [ F(y)dy 
0 0 
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= 1, x and y being independently distributed. Then the product, u = xy, will be 
distributed according to the law I P(u)du = 1, where 
0 


P(u) = [ flu/v) FO)(/u) dy. 


(The definite integral is a convenient representation of a probability law, since 
the limits on the integral sign indicate the interval over which the probability 
law is defined.) 

Proor. Represent the distribution of x by the density of dots along the 
axis of xz, and the distribution of y by the density of dots along the axis of y. 
Since, by definition, the (relative) number of dots in an interval dx is f(x)dz 
and the (relative) number of dots in an interval dy is F(y)dy, and since each 
dot in the interval dz is paired with each dot in the interval dy (in accordance 
with the hypothesis of independence), it follows that the (relative) number of 
dots in the corresponding area dxdy will be [f(x)dz][F(y)dy]. 

Now for fixed values of u and Au, plot the curves zy = u and zy = u + Au 
in the zy plane, as shown in Figure 1. Then the (relative) number of dots in the 
area bounded by these two curves is precisely what is meant by P(u)Au. Hence 
the expression P(u)Au may be built up by integrating the expression 
f(x)dx-F(y)dy over this area, as follows. 


Cd (ut+Au)/y 
P(u)Au = I lf f(x) Fly) az | dy 


@ (u/y)+(Au/y) 
an | | HF) / is | dy, 
0 u 


ly 


where z’ is a mean value of x between x = u/y and x = (u/y) + (Au/y). Now 
at every point in the plane, x = u/y (since u = zy). Hence we have: 


P(ujAu = I [f(u’/y) F(y)(1/y) Au] dy = | [ seuprarary ay | Au, 


from which the theorem follows immediately. 
THEOREM 2. Suppose a variable x is distributed in accordance with a proba- 


bility law [ f(x)dx = 1; and a variable y in accordance with a probability law 


| F(y)dy = 1, x and y being independently distributed. Then the quotient, 
0 


z = x/y, will be distributed according to the law | Q(z)dz = 1, where 


Qe) = | KewFody ay 


Proor. As in the proof of Theorem 1, the (relative) number of dots in the 
area dxdy will be [f(x)dz][F(y)dy]. 
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Now for fixed values of z and Az, plot the lines x/y = z and x/y = z + Az 
in the zy plane, as shown in Figure 2. Then the (relative) number of dots in 
the area between these lines is precisely what is meant by Q(z)Az. Hence the 


expression Q(z)Az may be built up by integrating the expression f(x)dz- F(y)dy 
over this area, as follows: 


oo (z+Az)y 
geae= [| [serra ae |ay 


= [ [sere [ae lay, 


y 


where x’ is a mean value of x between x = zy and z = zy + yAz. Now at every 
point in the plane, x = zy (since z = x/y). Hence we have 


Q@ae = | evFodvsddy =| [° senFody dy |ae, 


from which the theorem follows immediately. 


For convenience of reference, we include the corresponding theorems for the 
sum and difference, the proofs of which have long been well known. 


THEOREM 3. If x obeys a law | f(x)dx = 1, and y obeys a law | F(y)dy = 1, 
0 0 


then the sum, s = x + y, will obey the law I ¥(s)ds = 1, where 
0 


Hs) = | se - WFO) ay. 
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The proof consists in integrating f(x)F(y)dxdy over the area bounded by the 
two lines z + y = sandz + y = s + As, as shown in Figure 3. 


TuHeorEM 4. If x obeys a law [ S(x)dx = 1, and y obeys a law [ F(y)dy = 1, 


then the difference, w = x — y, will obey the law | R(w)dw = 1, where R(w) 
= [sw + ¥) FQ) ay. 


The proof consists in integrating f(x)F(y)dzdy over the area bounded by the 
two lines x — y = wand x — y = w+ Aw, as shown in Figure 4. 
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MOMENTS ABOUT THE ARITHMETIC MEAN OF A 
HYPERGEOMETRIC FREQUENCY DISTRIBUTION 


By Haroup D. LAaRsEN 


In a recent paper’ Kirkman has developed a method of continuation for 
obtaining the moments of a binomial distribution. Although other investi- 
gators’ have found various methods which are perhaps superior from the 
standpoint of elegance and compactness, Kirkman’s method is of some impor- 
tance inasmuch as it is adaptable to use in a course in elementary statistics. 
With this thought in mind, we shall extend Kirkman’s method to obtain the 
moments of the hypergeometric distribution of Table I.° 

TABLE I 


Variate Relative Frequency 
v , 





nya B™ /N& 
ncyahYBr—D / NO 
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nC 2a™ BO /NM 


1W. J. Kirkman, ‘““Moments About the Arithmetic Mean of a Binomial Frequency 
Distribution,’’ Ann. Math. Statist., vol. vi, no. 2, June, 1935, pp. 96-101. 

2 For example, J. Riordan, ‘Moment Recurrence Relations for Binomial, Poisson and 
Hypergeometric Frequency Distributions,’’ Ann. Math. Statist., vol. viii, no. 2, June, 
1937, pp. 103-111. 

3 For the Poisson distribution, this method degenerates into the application of a well- 
known recursion formula. 
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The hypergeometric distribution above can be conceived as being generated 
in the following manner. From an urn containing N balls, a = Np white and 
8 = Ngq black, n balls are drawn without replacements. The probability that 
exactly v of the balls are white is 


(v) g(n—v) ; a7(n) 
R= fa it", 
where 


a” = ala — 1)(a — 2)--- (a2 —v +1), 
a? = 1, etc. 


It may be noted in passing that the hypergeometric distribution reduces to a 
binomial distribution when n = 1, or N = o~. 

For the distribution of Table I, let m, denote the kth moment about the 
origin, and let uw, denote the kth moment about the arithmetic mean. Then 
by definition 


n 
ma = >» v' P,, 
v=0 
and 
n 
Mk = a (v — my)*P,. 
v= 


It is apparent that these moments are functions of the parameters a, 8, n and N. 
In particular, 


m, = F(a, B, n, N). 


We shall have need of the hypergeometric distribution of Table II. For the 
latter distribution, let v, denote the kth moment about the origin; i. e., 


n—1 

k , 

ye = dv Pos 
v=0 


TABLE II 





‘ 


v r 





n-1Co(a — 1)g@—-D /(N — 1)» 
1 naCi(a — 1)B-2)/(N — 1)-» 
2 n-1Co(a — 1)@B%-9/(N — 1)@- 


n—1 a-1Ca-rla — 1) 98 /(N — 1) 


Comparing Table I with Table II, we see at once that 
(1) vy, = F(a — 1,8,n —1,N — 1). 
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In other words, 4% is equal to the expression obtained from m; upon replac- 
ing a, n, and N respectively by a — 1, nm — 1, and N — 1. 
Now consider 


n 
k 
m = Sv | 
v=0 


= 2. #P.. 


v=1 


Replacing v by v + 1, we have 


n—1 
mM x (v + 1)*-Poss 





n—1 on _ (v) g(n—v—1) 
na y+ ye (n—1)! (a—1)"8 


N vi(n — v ony 1)! (N — 1)@—» 


n—1 
na nt el 
“= & (v + 1)" "Pi, 


whence, expanding the binomial and summing term by term, 


(2) m = W {yea + raCive2 + raCovn_s + --- +1}. 


By repeated use of (1) and (2), we can obtain quite readily the moments 
about the origin for the distribution‘of Table I. It follows by definition that 
— ., 


and, similarly, 


Setting k = 1 in (2), we have 


na 
mm = ='°v = na N. 
N / 


Setting k = 2, and then using.(1), we obtain 


me W {v1 + vo} 


na }(n — 1)(a — 1) 
ma ns 1 » +1} 


2 2 
n® a na 


~ N@® * WN’ 
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In a similar manner, 


m3 = ay te + 2v1 + vo} 


_ na (n ve 1)(a = 1) 
a resi” 41] 


(3) (3) 
nea 


n na 
sail “N® + 3 + —. 


NO) N 
The coefficients are seen to follow the same law as for the binomial distri- 
bution. As a matter of fact, if we replace a” /N“ by p’ in the above m’s, 


we obtain precisely the corresponding formulae for the binomial distribution. 
The coefficients for some of the higher moments are 


_ {1, 6, 7; 1} 
= {1, 10, 25, 15, 1} 
{1, 15, 65, 90, 31, 1}. 


The moments about the arithmetic mean can now be determined from the 
foregoing m’s by means of the semi-recursion formula 


2 
(3) Me = Me — cCieasmm — ~Coppomy — --- 


I have tried several formulae for this purpose, but it seems impossible to avoid 
a great deal of tedious reduction. Since the reduction in any case only involves 
algebraic manipulation, the details will be omitted. The formulae for the first 
few moments follow: 


Mo = 


a = 

—n 
i, se 
(N — n)(N — 2n) 
(N — 1)(N — 2) © 


If the higher moments are required in a practical problem, it appears to be the 
best course to first calculate the values of the m’s, and then use (3). 


us = npg(q — p) 


THE UNIVERSITY OF NEw MExIco. 





ERRATA 


The following changes should be made in my paper entitled: ‘‘On the Probability Theory 
of Arbitrarily Linked Events’’ (These Annals, Vol. IX, 1938): 


Page 262, after 7th line from top: insert --- . 

Page 262, 18th line from top: for [11] read [10]. 

Page 263, 5th and 6th lines from bottom: for ‘‘reasonably be assumed”’ read ‘‘easily be 
proved’’. 

Page 264, 4th, 13th and 14th lines from top: replace each a, by an. 

Page 265, in (24) replace fag by Sag. 

Page 267, 7th line from top: after 2?*/(k!2*) insert “‘(See [10], p. 13)’’. 

Page 267, 11th line from top: for ¢(x) read ¢(z). 

Page 268, 4th line from bottom: for (7) read (11). 

Page 271, in (47): replace 2nd and 3rd lines by 


1 1 
ties E oe” See a 


Hitpa GEIRINGER 








